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A THEORY OF LEARNING AND TRANSFER: I* 


HAROLD GULLIKSEN AND DAEL L. WOLFLE 
The University of Chicago 


A rational theory of discrimination learning is developed for the 
special case in which the subject must discriminate between two 
stimuli which differ with respect to one variable such as size or 
brightness. It is shown that the previous equations developed by 
Gulliksen and Thurstone are special cases of the present one. It is 
predicted that the ultimate level of accuracy of the discrimination is , 
inversely related to the difference, as determined psychophysically, © 
between the two stimuli. Other implications of the theory for experi- 
mental work are presented. 


INTRODUCTION 


This paper will develop a rational theory which will describe the 
course of learning of a sensory discrimination; predict the relative 
difficulty of different types of discrimination; and predict the nature 
and accuracy of transfer of the learned response to new stimuli. The 
prediction of both learning and transfer data from the same postu- 
lates indicates an order and unity underlying these two processes 
which has not previously been demonstrated. 

The theory is considerably more general than the previous devel- 
opments of Thurstone (28, 29) and Gulliksen (4) which were con- 
fined to a description of the course of learning. These earlier theories 
will be shown to be special cases of the present one. 

The present theory unites the three fields of learning, transfer, 
and psychophysics, in that it utilizes information obtained from psy- 
chophysics regarding the psychological similarity and dissimilarity of 
stimuli to assist in predicting the difficulty of learning and transfer 
experiments involving those stimuli. 

The nature of the relationship between psychophysics and learn- 
ing may be stated briefly as follows. Other things being equal, if two 
stimuli are very much alike, as determined by psychophysical meth- 
ods, it will be more difficult to learn to respond differentially to them 
than if they are very different. The phrase “other things being equal” 
takes care of such variables as the learning ability of the animal and 
its initial attainment. 

* We are grateful to the members of Professor Thurstone’s Seminar in 


Mathematical Psychology for criticism of this paper and particularly to Mr. John 
Reiner for assistance in the derivations involved. 
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The relationship between psychophysics and transfer may be 
similarly stated. Other things being equal, the more alike two stimuli 
are, the greater is the probability that a response learned to one will 
transfer to the other. 

These two statements imply that an inverse relationship exists 
between learning and transfer experiments. Where learning to dis- 
criminate between two stimuli is difficult, transfer is easy, and vice 
versa. 

It will be shown that the theory leads to the logical deduction of 
a number of experimentally known facts regarding discrimination. 
Further, it predicts a number of facts and relationships which have 
never been tested experimentally. Experimental tests of these deduc- 
tions will provide a crucial test of the adequacy of the theory. Until 
the psychophysical experiments necessary to determine the degree of 
similarity or dissimilarity of the stimuli used have been performed it ~ 
will be impossible to predict the quantitative results of any single ex- 
periment. At present the theory is limited to the prediction of certain 
relationships to be found between experimental results in psycho- 
physics, learning, and transfer. 

In order to explain the development of the theory we will follow 
through an analysis of the problems of visual discrimination in the 
white rat. The theory is by no means limited to visual discrimination 
data, but the development of the theory will perhaps be clearer if it 
is explained consistently with one type of problem, as an example. 

The theory will be applied to the usual type of discrimination ex- 
periment, using apparatus such as that devised by Yerkes and Watson 
(34), Lashley (18), Fields (2), or Munn (21), in which the animal 
receives a food reward for selecting the correct one of two or more 
similar openings. Before beginning the experiment proper, the ani- 
mal is taught to find food in the apparatus. When formal training is 
started, the openings are closed with doors which are marked with the 
stimuli to be discriminated. The stimuli usually consist of geometric 
figures which differ in size, color, form, or brightness. Whenever the 
animal approaches the door bearing the positive or correct stimulus 
it can get through to receive a food reward. Whenever it approaches 
the door bearing the negative stimulus it can not get through, and 
may receive some form of punishment. The two stimuli are presented 
in the right-left and the left-right order at random to prevent the ani- 
mal from developing a position habit. The discrimination is said to 
have been learned when some arbitrary criterion of correctness is 
reached. 

From the viewpoint of the present theoretical development, the 
essential features of such experiments may be listed as follows: 











HAROLD GULLIKSEN AND DAEL L. WOLFLE 129 


1. The response or responses involved have already been learned 


by the subject so that the acquisition of a motor skill is not a feature ° 


of the experiment. 

2. The subject is presented simultaneously with two (or more) 
stimuli, for example, a bright and‘a dim light or a large and a small 
circle. 

3. The subject is rewarded for making a certain response to the 
situation such as choosing the brighter light or jumping to the smaller 
circle; and punished, or not rewarded, for making other responses 
such as choosing the dimmer light or jumping to the larger circle. 


Definition of Stimulus 


Various suggestions have been made regarding the aspect of the 
total situation to which the animal responds. Lashley (19) has re- 
cently discussed the several possibilities of the basis of the animal’s 
response. The two most clearly contrasted hypotheses are: (1) that 
the animal responds positively to one of the stimuli or negatively to 
the other; and (2) that it responds to the relationship between the 
stimuli, i.e., greater than, dimmer than, etc. The first of these defini- 
tions implies a response to the absolute characteristics of the situa- 
tion, and the second implies a response to the relativA characteristics. 


In contrast to both of these definitions, we suggest tha¥the animal re- 3 


sponds to the total stimulus..configuration consisting of two stimuli 
presented simultaneously in a given spatial order. The theory here/ 
presented will be based on this definition. This definition does not im- 
ply that the animal must see all parts of the configuration with_equal 
clarity at any one time. The animal may scan the configuration or ex- 
amine it part by part. The response, however, is, in terms of the pres- 
ent definition, made to the total configuration whether that total be 
seen at once or built up as a construct of successive examinations of 
the parts. 

To avoid confusion of terms, we will use the word “stimulus” in 
the conventional sense to refer to the individual lights, colors, or other 


stimuli. We will use the word “configuration” to mean the pair of ,- 


stimuli in a given spatial order. Two configurations made up of the 
same two stimuli, but in the opposite orders, are shown as a and b in 
figure 1. The statements about stimuli and configurations apply to 
such variables as size, brightness, hue, and saturation. Size is used 
in the illustrations merely because it is easiest to represent graphically. 
The conventional account of the discrimination experiment de> 
scribes the animal as choosing between two simultaneously presented - 
stimuli; our account will describe the animal as responding differently 
to two successively presented configurations. / 





areca 





















































130 PSYCHOMETRIKA 
128 
3. 64 
bt 
rm 
5 6 
” A 
= | 
b. 3 8 
em 
= 4 
a B 
2 
1 
! 2 4 8 1 32 64 128 
STIMULUS ON RIGHT | 
FIGURE 1 FIGURE 2 
Two configurations made Plot of the two configurations shown 
up of the same two stimu- in figure 1. This plot provides a first 
li but in opposite orders. approximation to the problem of scaling 


the configurations. 


Psychological Scaling of Configurations 


All configurations which might be used in an ordinary two-choice 
size discrimination problem may be represented graphically, as fol- 
lows: Represent the size of the stimulus which appears at the right 
by the horizontal axis and the size of the stimulus at the left by the 
vertical axis; then point A on figure 2 represents configuration a of 
figure 1 and point B of figure 2 represents configuration b. This meth- 
od of plotting is not limited to size discrimination but may be used to 
represent the configurations used in any two-choice problem requiring 
discrimination with respect to one variable such as brightness, pitch, 
loudness, saturation, etc. In general, any configuration composed of 
two stimuli differing with respect to one variable may be represented 
as a point on a two-dimensional plot. Conversely, any point on such 
a plot represents a configuration composed of two stimuli. On such 
a graph, the two configurations consisting of two stimuli used in re- 
verse order would be indicated by two points symmetrically placed 
with respect to a 45° line through the origin. 

Such a graph suggests the possibility of a closer relationship be- 
tween learning theory and psychophysics. The degree of similarity of 
two configurations may be regarded as a psychological “distance.” The 
more similar two configurations are, the less is the “distance” be- 
tween them; the greater the difference between two configurations, 
the greater is the “distance” between them /if a psychophysical meth- 
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od of determining the distance between two configurations can be 
worked out, the resulting scale would represent psychological distances 
between the configurations involved. Such a method, if generalized to 
the n-dimensional case, would be applicable to configurations involv- 
ing many stimuli and also to stimuli of any type, including, for ex- 
ample, complex form discrimination. Young and Householder (35) 
have recently described a matrix method of determining the dimen- 
sionality of a set of points where the distances between the points are 
known. Scaling the configurations, while necessary in order to meas- | 
ure the distances accurately, is not necessary for a general discussion | 
of the relationships involved. 

Since the Fechnerian principle of a logarithmic relationship be- 
tween stimuli and sensations is well recognized, we have adopted it as 
a first approximation. Hence, in figure 2 the configurations are plotted 
on the basis of a logarithmic principle. 


Definition of Response 


The response made by a rat in a discrimination experiment would 
usually be defined in terms of the relationship to which the animal 
was being trained to respond. This definition follows from the belief 
that the most important aspect of the stimulus situation is the rela- 
tionship between the two stimuli used. Since we have elected to use 
a configurational definition of the stimulus, it is necessary to change 
the definition of response. 

Révész (24) and others have developed a technique for deter- 
mining which of several aspects of the stimulus, such as form or color 
is dominant. By an extension of this technique it would be possible 
to define the response for any species in terms of the dominant aspect 
of the stimulus. The ease with which rats learn maze habits and the 
difficulty of eliminating position habits in training rats on discrimi- 
nation problems suggests that directional habits may be the dominant 
type of response in this species. On the basis of such evidence, and 
experimental data to be presented later, the rat’s response may be de- 
fined directionally: The rat learns to jump to the right when present- 
ed with one configuration, and to jump to the left when presented with 
the other configuration. Our definition of the rat’s response in a dis- 
crimination experiment is here stated quite arbitrarily. Later it will 
be demonstrated that it is possible, on the basis of this definition, to 
predict such diverse facts as the transfer to new configurations on a 
relational basis and the greater difficulty of learning an absolute than 
a relative discrimination. The prediction of experimentally known 
facts on the basis of a directional definition of response provides addi- 
tional evidence for the value of such a definition. 
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It is also possible to test the definition directly by experiment. 
Since the training prior to the transfer tests normally eliminates the 
possibility that the animal will respond to the test situations on a posi- 
tional basis, Yit is impossible to determine from such experiments 
whether the original learning should be regarded as a relative bright- 
ness habit or as two directional habits. A crucial experiment must 
allow the animal to transfer on either basis. Such an experiment 
would present the subject with the same configuration (the same two 
stimuli in the same order) on every trial. The animal would then learn 
to go, for example, to the brighter light on the right-hand side. After 
the attainment of a fairly high level of accuracy, new configurations 
could be presented and the subject rewarded no matter how it re- 
sponded. The definition of the response would then depend upon the 
outcome of these transfer tests. If the subject went to the right on 
the test trials, we would define the response as being a jump to the 
right. If it went to the brighter of the two stimuli, the response would 
be defined as being the brighter side. To distinguish between these 
two alternatives, we performed the following experiment. 

The subjects were 24 female hooded rats from the colony main- 
tained by the Department of Psychology at the University of Chicago. 
The rats were trained on the Lashley jumping apparatus, in which the 
animal is confronted with two similar apertures in each of which a 
card can be placed. The cards bear the stimuli to be discriminated. 
The rat jumps from a small platform some 9 inches from the cards. 
If it jumps against the correct card, the card falls backward and the 
animal lands on a platform where it finds a food reward. A block be- 
hind the wrong card holds it securely in place so that if the rat jumps 
against the wrong card it falls into a cloth net some 20 inches below | 
the cards and jumping platform. The fall constitutes punishment for 
an incorrect choice. Training was continued until the rat attained a 
criterion of 50 consecutive errorless trials (10 trials per day) in re- 
sponse to the same configuration (the same two stimuli in the-same 
spatial arrangement). It was then tested on a series of 16 different 
configurations. The stimulus cards were black with white circles on 
them. Five sizes of circles were used. The areas of the circles increased 
in the geometric ratio 1:3:9:27:81; the actual areas were 0.26, 0.78, 
2.35, 7.06, and 21.15 square inches. For training trials a circle of 0.78 
Sq. in. was paired with one of 7.06 sq. in. The test configurations con- 
sisted of 16 of the 25 possible pairings of the five circles. The training 
and test configurations are shown in figure 3 and plotted in figure 4. 

We followed the customary procedure by first training the ani- 
mals to jump through open windows with no stimulus cards in them. 
When this problem was mastered, a white card was placed in one win- 
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FIGURE 3 
The test and training configurations. 
One-half of the animals were trained on configuration a; the other half on 

configuration 6. Each animal was tested on the other 16 configurations shown. 

The numbers under each configuration refer to the sizes of the circles; 1 being the 

smallest and 5 the largest of the five sizes used. 
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FIGURE 4 


Plot of the training and test configurations. — ; 
Each animal was trained on one of the two circled configurations and tested 
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dow and a black card in the other. The positions of the two cards were 
alternated in random fashion. Not until the animal mastered the prob- 
lem of jumping to the white card was formal training with the stimu- ‘ 
li to be discriminated begun. But training the animals to jump to a 
white card when it is paired with a black one has the effect of train- 
ing them to jump to the brighter of the two cards. This procedure, 
then, gives training on a light-dark discrimination before any formal 
records of the experiment are taken. 

Because some such training seems necessary with the Lashley 
jumping procedure, we used it with half of our animals. With the 
other half, we reversed the preliminary training and made the ani- 
mals jump to the black card and avoid the white one. The reversal of 
the usual technique had, of course, the effect of training these animals 
to choose the darker in preference to the lighter stimulus during pre- 
liminary training. 

Twelve animals were given each type of preliminary training. 
Half of each of these two groups was presented on all regular trials 
with configuration a of figure 3 and half with configuration b of figure 
3. Half of each of these four groups was trained to jump to the right 
and half to jump to the left. There were then eight different combina- 
tions of preliminary training, configuration, and response. Three ani- 
mals were trained on each combination. These eight groups, with the 
preliminary and training conditions summarized for each are described 
in the first seven columns of table 1. 

Training was continued at the rate of 10 trials per day until the 
animal had made no errors on five consecutive days. If an animal 
made errors on the first trial of the first day, but made none there- 
after, the first 5 days were accepted as satisfying the learning cri- 
terion. Learning was so rapid that this criterion was met by most 
animals in five or six days (including the criterion runs). 

Transfer tests were introduced after satisfying the learning cri- 
terion. These tests were given on the sixth and twelfth trials of each 
day’s series. The remaining 10 trials were on the training configura- 
tion. The animals were rewarded on test trials whether they jumped 
to the right or to the left. The different animals were given from two 
to six tests on each of the test configurations. 

~ Column 9 of table 1 gives the frequencies, and percentages, of 
jumps in agreement with the hypothesis that the responses were made 
on a relational basis and column 10 gives corresponding data with re- 
spect to the hypothesis that they were made on a directional basis. The 
basis which each rat apparently used is named in column 11. A direc- 
tional hypothesis was recorded whenever the percentage of test jumps 
in agreement with that hypothesis was greater than the percentage of 
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test jumps in agreement with the relational hypothesis. Whenever 
the reverse was true, a relational hypothesis was recorded. 

When the preliminary training required the animal to jump to the 
white card and the regular training required it to jump to the side of 
the larger circle (Groups 1 and 2), or, when the preliminary training 
required the animal to jump to the black card and the regular training 
required it to jump to the side of the smaller circle (Groups 7 and 8), 
the preliminary and regular training both reinforced a tendency to 
respond to the larger (brighter) or smaller (dimmer) of the two stim- 
uli. Under these conditions, seven of the 12 animals of groups 1, 2, 7, 
and 8 responded to the test configurations on a relational basis. The 
other five responded on a directional basis. 

When the preliminary training required the animal to jump 
to the white card and the regular training required it to jump to the 
side of the smaller circle (Groups 3 and 4), or, when the preliminary 
training required the animal to jump to the black card and the regular 
training required it to jump to the side of the larger circle (Groups 5 
and 6), the regular training conflicted with any tendency, which may 
have been established by the preliminary training, to respond on a 
relational basis. Under these conditions alf 12 animals of Groups 3, 4, 
5, and 6 responded to the test configurations on a directional basis. 
(Of the 506 test jumps, 503 were to the side which had been correct 
during regular training.) 

“—~— The difference in the behavior of groups trained to go to the 
white square and those trained to go to the black square indicate that 
this preliminary training, which is normally not recorded as part of 
the experimental data, is important in determining how the rat will 
respond on the subsequent regular training. This influence has hither- 
to been neglected in most studies of discrimination. 

The results as a whole indicate that a directional habit was the 
more frequent type of response learned under these experimental con- 
ditions. Seventeen of the 24 animals responded to the test configura- 
tions on a directional basis and only seven on a brightness basis. In 
conclusion, it seems considerably safer to assume that the rats’ initial 
response in the discrimination problem is directional rather than rela- 
tive. Consequently, we will define the response directionally as a jump 
to the right or a jump to the left. 


THE TWO-CONFIGURATION PROBLEM 


On the basis of the definitions of the stimulus and of the response 
given in the preceding section, an equation of the learning curve can 
be derived. We will consider first the case in which the problem is to 
discriminate between two configurations each of which is composed of 
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two stimuli differing with respect to one variable. The animal may 
be rewarded for jumping to the left in response to configuration a and 
to the right in response to configuration b of figure 1. This is the sit- 
uation which would usually be described as requiring the animal to 
respond positively to the larger of the two stimuli. 


Definitions of Variables 


As far as possible we have used the symbols previously employed 
by Thurstone (28, 29) and Gulliksen (4) in order to emphasize the 
relationship between the concepts common to the three equations. The 
symbols e, s, u, w, h, g, k and c have been taken from the previous 
equations. We have added subscripts and primes to indicate exten- 
sions of meaning necessary in the present more generalized formu- 
lation. 

If we designate a cumulative count of correct responses by w and 
a cumulative count of incorrect responses by u, and assume that the 
subject is being trained to go to the left for configuration a, and to the 
right for configuration b (see Figure 1), we have: 


w, = number of left or correct responses to configuration a. The 
animal is rewarded for each of these responses. 
Ua = number of right or incorrect responses to configuration a. 
The animal is punished for each of these responses. 
Correspondingly, 
w, = number of right or correct responses to configuration b. The 
animal is rewarded for each of these responses. 
u, = number of left or incorrect responses to configuration b. The 
animal is punished for each of these responses. 
The strengths of the tendencies to make these four responses may 
be represented as follows: 
S, = strength of the tendency to make a left or correct response 
to configuration a. 
€, = strength of the tendency to make a right or incorrect re- 
sponse to configuration a. 
8, = strength of the tendency to make a right or correct response 
to configuration b. 
é, = strength of the tendency to make a left or incorrest response 
to configuration b. 


If the animal is trained positively to the smaller of the two stim- 
uli (See configuration a and b in figure 1), the words “right”? and 
“left” would be interchanged in all of the foregoing definitions. Such 
a change would not affect any of the subsequent developments. 
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stoi i? Basic Assumptions 
(1) It will be assumed that the strength of the tendency to go to 
the right is influenced by the amount of punishment and reward that 
the rat has received for making right choices, 2.e., 


+ ¥ 
) 


re: €a = f(a, Wo) and 
: 8) = f (Wr, ta). 


Correspondingly, the strength of the tendency to go to the left is a 
function of the punishment and reward of left choices, 7.e., 


8, = f(w.,%) and 

€> = f (UW, Wa). 

More definitely it will be assumed that: 
68, _ 68, _ 
on =k and 7 * 


k = the amount by which each reward of a correct response increases 
the strength of the tendency to make the same response to the 
same configuration. 





, 6 8, , 
=—c’ and =—c’. 
0 Uy 0 Ua 








c’ = the amount by which each punishment of an incorrect response to 
one configuration decreases the tendency to make the same (or 
correct) response to the other configuration (c’ < c). 

6 ey 








c = the amount by which each punishment of an incorrect response 
decreases the strength of the tendency to make the same response 
to the same configuration. 


6 €, , 6 e, 

5 Wy =o dW, 

k’ = the amount by which each reward of a correct response to one 
configuration increases the tendency to make the same (or in- 

correct) response to the other configuration (k’ < k). 

The parameters k and ¢ represent the learning constants of an 
animal — the amount by which it profits from each rewarded or pun- 
ished trial. The parameters k’ and c’ represent those learning con- 
stants diminished by the effect of the distance separating the two con- 
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figurations. k’ and c’ are, then, respectively, functions of k and dis- 
tance and ¢ and distance. The relation between the size of k’ or c’ and 
the distance between the two configurations may be thought of as a 
generalization gradient (9, 13). This concept is adapted from Pav- 
lov’s concept of generalization. The greater the distance between the 
two configurations, the less is the value of k’ or c’. The nature of the 
function relating amount of generalization to distance has not been 
determined for visual] discrimination data. Hovland (9) has found 
a negatively accelerated generalization gradient for a galvanic re- 
sponse conditioned to tonal stimuli. But the closest test stimulus used 
by him was 25 j. n. d.’s removed from his training stimulus. The 
shape of the generalization gradient between the training stimulus 
and one removed by 25 j. n. d.’s has not been determined. Perhaps the 


entire gradient might be represented by k’ = E “ mf For the develop- 





ment of the present theory it is not necessary to determine the exact 
function relating k’ (or c’) to distance. It need only be assumed 
that the function is a monotonic decreasing one. The findings of both 
Bass and Hull (1) and Hovland (9) support this assumption. When 
stimulus configurations are accurately scaled and quantitative predic- 
tions of the difficulty of different problems are attempted, it will be- 
come necessary to determine the shape of the generalization gradient 
more precisely. 

Solving each pair of partial differential equations involving the 
same response tendency (t.e., the same dependent variable) gives the 
following equations for the four functions indicated above: 


Ca = Ja — CU, + kr, , (II-1)* 
8, = Na — C's + kur, (II-2) 
&=9—cm+ku, (II-3) 
8 =h—cu,+ kw. (II-4) 


The g’s and h’s are constants of integration and represent the 
strength of each of the tendencies at the beginning of the experi- 
ments), »~ 
(2) It will be assumed that the strengths of the correct and in- 
correct responses are related to the number of these responses by the 
equations: 





du,  && 
dw. —_ Bo ’ (II-5) 
du, bie ey 
dwy S Sy ‘ esd 


* The number of each equation dealing with the two-configuration case will 
be preceded by II. 


i 
i 
k 
' 
} 
r 
; 





140 PSYCHOMETRIKA 


Substituting equations (II-1), (II-2), (II-3), and (II-4) in (II-5) 
and (II-6) gives the following differential equations: 


dU, Qa—CUl +k'wm 


dw, pe ha — CU, + kw,’ — 





dup — Jo —C&-+ kw, 
dw hy—c'u.+ kw, 


(3) A third relationship is found in the experimental conditions. 
These are usually so arranged that either: 
a) each of the configurations a and b is presented the same 
number of times; or 
b) the animal makes the same number of correct responses to 
each configuration. 
The former is represented by 


Ua + Wa = Uy + We, (II-9) 


(II-8) 





the latter by 
Wa = Ws. (II-10) * 


Equation (II-9) represents the experimental conditions under 
which each configuration is presented an equal number of times. 
Only one response is allowed to each configuration. This technique 
is used by a number of experimenters. Equation (II-10) represents 
the experimental conditions under which the animal is allowed to re- 
spond to each configuration until it makes the correct response. The 
total number of correct responses is thus the same to both configura- 
tions, but the number of wrong responses to one may not equal the 
number of wrong responses to the other. This technique is some- 
times known as the Lashley method (19). 

The system of equations represented by either 


(II-7), (II-8), and (II-9) 
or by 
(II-7), (II-8), and (II-10) 


*It is possible to regard the u’s and w’s as functions of time. In this case 
(II-10) is exactly given by 
Ww, (t,) = w, (t;) (II-10a) 


for a large number of values of i throughout the course of the experiment. With 
the qualification that | w,(t) — w,(t)| < e¢ for every t. Where e is small (as is 
oo the case with the Lashley technique) we can approximate (11-10a) by 
(II-10). 

Since w, and w, are separate quantities in the learning process of the animal, 
replacing them by a single variable is not only an approximation to the solutions 
of (II-7) and (II-8) but also an approximation of the psychological model rep- 
resented by (II-7) and (II-8). 

A similar argument applies in the case of equation (II-9). 
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is a soluble set of equations, but there is no closed solution and at 
present a series approximation is not worth while. Therefore, one 
simplifying assumption will be made. We will consider only those 
cases where the initial strengths of the two tendencies to respond 
correctly equal each other 


ha = hy (II-11) 


and the initial strengths of the two tendencies to respond incorrectly 
equal each other 


Ga = 9; (II-12) 


h, need not equal g.. 

With this limitation it can be seen that equation (II-8) may be 
obtained from equation (II-7) by interchanging the subscripts a and 
b. It can be shown that with this limitation and with equation (II-10) 
it follows that 


Ua = UW. (II-13) 


This can be seen by expanding wu, and wu as power series (20) 
in the variable w: 


te = Saw, w=Sdwi. (II-14) 
i=0 7=0 
Substituting (II-14) in (II-7) and (II-8), assuming (II-11) and 
(II-12), and comparing coefficients of like powers of w, we find that 
a; = bi 


for every value of 7, which proves (II-13). 
Using (II-10) and (II-13), equations (II-7) and (II-8) reduce - 
to one equation with the subscripts eliminated, avers r 


du _g—cu + k'w 
dw h—cu+kw 
Derivation of Equation of the Learning Curve 


Equation (II-15) may be solved by first shifting the origin to 
remove the constants g and h. In order to do this, set 


(II-15) 





u=u+tm (II-16) 
and 
w=w'+n (II-17) 
where 
kg —Kkh (Kq-Kh — 
"keke 2 Ly ( ) 








142 PSYCHOMETRIKA 


and 
<aaeane II-1 
"= te Ke" “ines 
Differentiating (II-16) and (II-17) gives 
du ae du’ IL2 
do ~ a sles 


Substituting (II-16), (II-17), (I-18), (II-19), and (II-20) in 
(II-15) gives: 





—ch 
dat g—elw +2 Tearel +¥ tw tare! 
7 = = 4h - (II-21) 
pew a te PR eat pang 7 ee 
Simplifying (II-21), gives 
du’ kw’ — cu’ 
dw = kw’ — cu aie 
Again defining a new variable v by the equation 
uw = vw’ (II-23) 
and differentiating (II-23) we have 
du , dv 
qr = Uw Tn’ (II-24) 


Substituting (II-23) and (II-24) in (II-22) and cancelling w’ from 
the right-hand member of the equation gives 


dv _k'—cev 


v+tw Peal yanee (II-25) 
Separating the variables in (II-25) gives 
ee cea dy. (11-26) 





wo Ok —(k+0)v4+ cv? 


To integrate the right-hand member of (II-26) it is necessary to 
multiply numerator and denominator by 2 and to add and subtract 
c dv in the numerator. Then, rearranging terms, we may write 


= Be » c—2c'v 


= (1/2) = dv 





— (K4+e)v + c’v? 








in 
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k—ce 1 
+( 2 k—(k+c)v+cv? ) dv. (II-27) 


integrating (II-27) gives 


log w’ = — 1/2 log [e’v? — (kK+-c)v+ Kk] + (=) ) 








2c'v — (k-+-¢) —r 
™ [ee (kc) 7 


4+ log K (I-28) 
where 
re = (k+c)?— 4k’, (II-29) 


and log K is the constant of integration. The logarithmic rather than 
the arc tangent form of the integral for the last term of equation 
(II-27) is appropriate since 7? > 0. 

Taking antilogarithms of (II-28) gives 


— 1 2c'v — (k++ c) —ree 
<n Ke (k-+ chy aul 2c'v — (ke) a 


Substituting the original variables u and w from equations (II-16), 
(II-17), and (II-23) gives 





(II-30) 




















w—n= Koy — e+ DE | 
lo, “— m fee 
2c’ —(k+c)—r 2r 
x——* (II-81) 
2c a (k+c)+r 


where m, n and r are defined by equations (II-18), (II-19), and 
(II-29). 
Equation (II-31) may be rewritten as follows: 








(aC ) ” lz (u—m)? — GEA (w—n) + Tena | 








2c’ (u—m) — (k+c-+4+7) (w—n) re. , 
or 
[¢'(w—m)* — (Ie+0) (w—m) (w—n) +k’ (w—n)*] 
2c’ (u—m) — (k+c—r) (w—n) |E* 
[pF Gm) — (ker) (w—n) | = K?. (II-33) 


Since at the origin « = w = 0, we may write K? from equation 
(II-33) by setting u and w both equal to zero. Doing this gives 
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ait) cas itee 2. 
Pa [ew— tom eww Paar ee 
(II-34) 


Equation (II-33), giving K? the value defined by (II-34), is the 
solution for the conditions given at the beginning of the derivation. 

If c’ and k’ in equation (II-33) are set equal to zero (i.e., if trans- 
fer effects are negligible) the equation simplifies to the form previ- 
ously presented by Gulliksen(4). If, further, c = k in equation (II-33) 
(i.e., if the effect of punishing an incorrect response is assumed to be 
equal to the effect of rewarding a correct response) the equation fur- 
ther reduces to the form developed by Thurstone (29). Both of these 
equations are, therefore, special cases of equation (II-33). 

Equation (II-33) may be simplified in a special case by ignoring 
the exponential term. As c approaches k this term approaches unity, 
so that (II-33) reduces to 

ce’ (u—m)? — (k-+c) (u—m) (w—n) + k' (w—n)? = K?, 
(II-35) 


Re-evaluating K? by setting u = w = 0 gives 
K? = c'm? — (k-+c)mn- k'n?. (II-36) 
Substituting (II-36) in (II-35) and simplifying gives 


c'u? — 2c’'mu — (k-+c) (uw—mw—nu) + k'w? — 2k'nw = 0. 
(II-37) 


Substituting the values of m and n from (II-18) and (II-19) gives 








cut + kw — (ke) uw + oe r 
kg—k’h '(c'g—ch 
bi a we es Fe w=0. (II-38) 


Equation (II-38) holds exactly in the special case where c = k 
and c’ = k’ since then the exponential term in equation (II-33) is equal 
to unity. In this case equation (II-38) becomes 


w+ wt — 26 uw — 2044 20" = 0, (11-39) 


Assuming in addition that the strengths of the two responses involved 
are equal at the beginning, i.e., that g = h, we have 


ut + wt — 2 wa — 294 + oF w = 0. (11-40) 











/ 
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Solving explicitly for u we get 





Ju=ptpu—y[q—1]e+[e—1]Fe4] F] : 


(II-41) 


Equations (II-38), (II-39), (II-40), and (II-41) are hyperbolas pass- 
ing through the origin. The curve is shown in figure 5. The terms in 
these equations have the following meanings. 


u = cumulative count of incorrect responses. 

w = cumulative count of correct responses. 

g = initial strength of the tendency to respond incorrectly. 

h = initial strength of the tendency to respond correctly. 

k = the amount by which each reward of a response increases the 
strength of the tendency to make the same response to the 
same configuration. 

k’ = the amount by which each reward of a response increases 
the strength of the tendency to make the same response to 
the other configuration. 

c = the amount by which each punishment of a response de- 
creases the strength of the tendency to make the same re- 
sponse to the same configuration. 

c’ = the amount by which each punishment of a response de- 
creases the strength of the tendency to make the same re- 
sponse to the other configuration. 


La Properties of This Learning Curve 


In equation (II-38) the two asymptotes are 





4 k+exsr (kAt+etr)n 
oe ot | = |: (11-42) 


The secant of the angle between the two asymptotes is 


The tangent of this angle is 





V (k-+e)? + (c—k’)? 
+k’ ; 





. 


The asymptotes intersect at the point u = m, w= n. 
In the special case g = h, c = k, and c’ = k’ for which equation 
(II-40) holds exactly, the asymptotes are 
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u=[-= ve et 72 “Ver (11-43) 


The secant of the angle between the asymptotes is + 





ome 
The tangent of this angle is a. 
; ee dite. Sky 
The asymptotes intersect at the point u = kR’ w= ~Pi Fe 
U 


ASYMPTOTE 


CUMULATIVE ERROR CURVE 








CORRECT W 








FIGURE 5 
The cumulative error curve and its asymptotes. 


Application to Experimental Data 


(1) In deriving equation (II-39) from (II-38) it was assumed that 
c = kandc’ = k.. For any given set of data it is possible to test this 
assumption. 

If c= k andif c’ = k’, then the coefficients of u? and w? in equa- 
tion (II-37) will be equal. After these coefficients have been found for 
a particular learning curve, their equality or departure from equality 
will reveal the equality or inequality of the learning effects of punish- 
ing an incorrect and rewarding a correct response. 

(2) In deriving equation (II-40) from equation (II-39), it was 
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assumed that the strengths of the two responses were equal at the be- 
ginning of practice. It is possible to check upon the accuracy of this 
assumption for any given set of data, after it has been shown that 
cé=k. 

The asymptotes of the curve (II-39), intersect at the point 








k'g—kh kg —k'h ’ f 

(k? — ke?) and ek) If g and h are in reality equal, these co- 
. g a i z 

ordinates reduce to a and ELF’ If the intersection of the 


asymptotes of the plotted curve is equidistant from the u and w axes, 
then g and h must be equal to each other. Any departure from equality 
will indicate which tendency was stronger. The amount of inequality 
will indicate the size of the difference in the initial strengths of the 
two tendencies. 

(3) It may be deduced that: The difficulty of a problem, as meas- 
ured by the maximum level of accuracy attainable, is inversely related 
to the distance separating the two configurations to be discriminated. 

From equation (II-42) or equation (II-43) — the equations of the 
asymptotes of the learning curve — it can be seen that the slope of 
the asymptotes is a function of the relative sizes of k and k’ and of 
cand c’. These ratios, in turn, are functions of the distance separating 
the configurations to be discriminated. If the distance is zero, a situa- 
tion which can occur only when the subject is forced to attempt to dis- 
criminate between a configuration and itself, then c = c’ and k = k’. 
In this case the two asymptotes will have a slope of unity, the learning 
curve equation will become u = w, no learning will be possible, and 
the subject will continue to respond with only chance accuracy. 

From the same equations it can be seen that if the distance sepa- 
rating the two configurations is sufficiently great to make c’ and k’ 
insignificantly different from zero, then the asymptote approached by 
the learning curve will have a slope of zero. In this case perfect learn- 
ing is possible. 

For most configurations used in ordinary learning experiments c’ 
and k’ have probably been small in comparison with c and k. No learn- 
ing studies have reported attempts to determine whether or not the 
asymptotes had a small positive slope which increased with the diffi- 
culty of the discrimination, but much of the psychophysical work has 
shown that the levei of accuracy attainable increases with an increase 
in the difference between the stimuli being discriminated. 
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THE UNIT HIERARCHY AND ITS PROPERTIES 


CYRIL BURT 
Psychological Laboratory, University College, London 


A correlation matrix may be expanded as the weighted sum of a 
series of ‘unit hierarchies’, The properties of the ‘unit hierarchy’ 
are not only of theoretical interest for themselves, but lead to sim- 
pler modes of practical calculation. The analysis is analogous to a 
spectral set of projective operations in quantum-theory: and the 
analogy itself suggests many further problems and solutions. 


In several investigations (1, 2, 3, 21) my research students and 
I have applied a modified form of factor analysis that has proved to 
possess definite advantages of its own. It seems a little more exact in 
theory than the methods generally current in England, a little less 
laborious in practice than those recently advocated in America, and 
rather more in line with solutions worked out by mathematicians and 
physicists who deal with similar issues in other fields. Hitherto, how- 
ever, the references to it have been incidental to some special prob- 
lem. Accordingly, I gladly welcome the suggestion that a more ex- 
plicit account should be offered of the theoretical arguments on which 
it rests. 


I. Definition of Factors 


In summarizing the general principles available for analyzing a 
table of correlations, I have distinguished two main alternatives: (a) 
a ‘group factor’ method, and (b) a ‘common factor’ method. Method 
(a) proceeds by partitioning the matrix to be factorized into separate 
submatrices, and seeks to explain each of these by a positive factor of 
limited range; method (b) deals with the matrix throughout as a 
whole, even when some of its elements or submatrices are negative. 
The latter mode of approach has usually been preferred by the Eng- 
lish School, since its members have been interested primarily in the 
search for a single central factor. But in early work on educational 
testing, I found it necessary from the very outset to envisage, not 
only ‘general factors’, but also ‘specific’ — i.e., factors apparently re- 
stricted to groups of tests; accordingly, the ‘group factor’ method 
was introduced for purposes of practical calculation. Since in theory 
the group factors (which need not be independent) may always be 
obtained from the independent common factors by a further rotation 
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of axes, I shall confine myself here to what I have called method (b). 
This mode of reduction implies that the factors so obtained can 
claim in the first instance to be nothing more than mathematical ab- 
stractions; as they stand, they will have no necessary relation to con- 
crete psychological distinctions. If in actual practice such identifica- 
tions are often possible, that is due to an appropriate choice of the 
tests to be correlated, not to any properties of the mathematical pro- 
cedure as such. What the mathematical procedure can guarantee is 
that each factor, as it is extracted, will have a maximum discrimina- 
tive and predictive value, but not that it will have a maximum of 
psychological meaning or of psychological simplicity.* 

In accordance with these principles I define the k factor as that 
component which will account for the maximum amount of variance 
remaining after (k — 1) components have been removed. This def- 
inition implies that our primary aim is the ‘analysis of variance’. 
Thus I consider the real object of factor-analysis to be the analysis of 
the original matrix of measurements; the analysis of the correlations 
(which originally gave rise to the search for factors) I regard as 
merely a means to that end. Such a standpoint is in keeping with the 
trend of English statistical work in other directions (cf. Fisher, 4) ; 
and enables us to apply factor-analysis to the results of correlating 
persons as well as traits. 


* As a working procedure, the group factor method is required only when 
the variables correlated form a discontinuous series, that is to say, when the co- 
efficients can be grouped together to form two or more submatrices of compara- 
tively high positive figures separated by submatrices of negative, zero, or com- 
paratively low figures. A two-fold pattern of this nature occurs most conspicu- 
ously when the first general factor has been partialled out; but it may also be 
seen in the observed correlations themselves, when the general factor has in effect 
been eliminated by some other device (e.g., by choosing homogeneous populations 
or material, or by re-scaling the measurements transversely in terms of devia- 
tions about their average). Even then the whole table may still be analyzed in 
terms of a common factor, which will now be bipolar. (One or two recent inves- 
tigators have assumed that it is impossible to extract a general factor when one 
or more rectangular submatrices contain negative coefficients: the method de- 
scribed below shows clearly that this assumption is erroneous). In certain cir- 
cumstances, however, the variables selected for correlation form discontinuous: 
groups: for instance, in correlating the results of scholastic tests, the tests of (i) 
handwork, (ii) arithmetic, (iii) verbal subjects (reading, spelling, composition), 
yield three discontinuous blocks of positive correlations. In such a case the group 
factor method is more appropriate. A typical example may be seen in my analy- 
sis of school subjects in Distribution and Relations of Educational Abilities, 
1917, Table XX, p. 57. 

+ These points were briefly stated in the memorandum I was asked to pre- 
pare for the International Institute Examinations Inquiry (1, pp. 247, 275, 306). 
Since that was first drawn up, Kelley (5) has published a ‘new method of analy- 
sis’ which is also to be applied to covariances: but he seems to be in error in 
assuming that the results are identical whether covariances or correlations are 
used. In what follows, since the more appropriate terms, ‘comultiplying’ and 
‘covariating’, are still somewhat unfamiliar, and the distinctions are here unim- 
portant, I shall use the term ‘correlating’ loosely to cover all three processes. 
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II. The Canonical Expansion 


With the foregoing definition of factors, the problem of factor 
analysis becomes simply a problem in multiple correlation; and a 
unique algebraic solution can be at once obtained along the usual lines 
by invoking the principle of least squares.* If there are n tests or 
traits, the result will express the matrix of observed measurements 
in terms of n independent factors, which emerge in order of the 
amount contributed by each to the total variance. Of these, however, 
the factors that emerge last of all will have little or no statistical sig- 
nificance. If, therefore, we seek only the first k, and from these re- 
construct a reduced matrix of test-measurements, or a reduced matrix 
of correlations, then the reconstruction will yield the nearest fit to 
the originals that can be derived with this smaller number of factors: 
e.g., if we take & = 1, we shall reach the closest approximation ob- 
tainable in terms of a single ‘general factor’ only.+ 

The matrix expressiont of this analysis is sufficiently familiar: 
but the method that I propose pushes the analysis one stage further 


Recent illustrations of work in our laboratory on correlations between persons, 
using both ‘method a’ and ‘method b’, have been described in this journal by my 
colleague Dr. Stephenson (6); a review of earlier applications of the device, with 
a — of its value and limitations, is given in my own paper on the sub- 
jec : 

*The proof of the least squares formula for the saturation coefficients is 
given in my memorandum for the International Institute Examinations Inquiry 
(1, pp. 247, 286); the proof of the formula for the multiple correlation coefficient 
is given by Kelley (9, p. 296). We have only to substitute 7,; for Kelley’s 7; 


(i.e., to identify the ‘criterion’ with the ‘general factor’, taken as the best weight- 
ed sum of the tests themselves), and we are led immediately to equaticn (6). 

+ In passing it may be noted that if the ‘general factor’ is conceived as dis- 
tributing the individuals, not along a linear scale, but into two discrete classes 
(such as Dr. Stephenson’s ‘psychological types’, which are described as being ‘as 
discontinuous as the two sexes’), the result is essentially the same: we reach a 
factor that yields the widest discrimination, no longer between the different in- 
dividuals, but between the different types: (in the notation used below, the usual 
regression coefficients, w = f’ R-1, become w — d’ R-', where the vector q denotes 
the differences between the means of the two functions. Except that one determi- 
nant has here to be evaluated instead of two, the expression is analogous to that 
reached by Fisher (6) for somewhat similar problems in other statistical fields). 

Where the classification into types is itself determined on the basis of inde- 
pendent multiple measurements, and the problem is to correlate a mental classi- 
fication with a physical (as in verifying Kretschmer’s theory), the issue becomes 
more complex. As I have indicated elsewhere (21, p. 186), we are then correlat- 
ing, not a scalar variable with a vector, but one vector with another; and the 
nrg becomes a problem in what may be called bi-multiple (or vector) corre- 
ation. 
‘ tIn what follows, the same letter of the alphabet is used to denote a matrix, 
its vectors (i.e., its rows or columns), and its elements: the matrix will be in- 
dicated by a capital letter, the vector by a lower case letter in bold type, and 
the elements by a lower case letter in italics with a double subscript. So far as 
possible, I have substituted Thurstone’s notation for my own. His book (7) un- 
fortunately had not appeared when my memorandum for the International Ex- 
aminations Committee was drawn up: further, it seemed at that time desirable 
that the notation should, so far as possible, preserve the symbols most familiar to 
English students, namely, those connected with Spearman’s methods of analysis. 
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back than its usual formulation.* The fundamental postulate, and the 
final outcome, of factor analysis is usually written 


S=FP (1) 


where S = the ‘score matrix’ of observed measurements, F = the 
‘factorial matrix’ of factor-loadings or saturation coefficients, P = 
the orthogonal or semi-orthogonal ‘population matrix’ giving the hy- 
pothetical measurements of the population for the several indepen- 
dent factors as above defined. 

Let us, however, put 


> Pir = 15° D Pin = Vn, (2) 


so that v; denotes} the amount contributed to the total variance by 
the 7*" factor. We may then re-write (1) in the more convenient form 


S=LV'P, (3) 


where L denotes the orthogonal matrix of direction cosines specifying 
the relation of factor-axes to test-axes, and V the diagonal matrix of 
factor variances. 

From this ‘canonical resolution’ of the score matrix S we at once 
obtain 


R=SS=FF (4) 
=LVL', (5) 


where R denotes what in the analysis of variance would be termed 
the matrix of ‘sums of squares and products’ and (if the scores are 
in unitary standard measure) is simply the matrix of correlations be- 
tween the various tests. It will be seen that R must be real, square, 
symmetric, positive-definite,ft and of rank n. 

We have now reduced the correlation matrix to its simplest or 
‘canonical’ form. Equations (3) and (5) which express these canon- 
ical transformations I regard as fundamental. It is on them that 
most of my arguments will be based. 

In theory the values required for V and L can be obtained by 
solving the equation 

* My equation (1) is Thurstone’s equation (2), (7, p. 54) ; my equation (4) is 
equivalent to his equations (22) and (39). The general relations between Kelley’s 
analysis and Thurstone’s may be expressed by saying that, for Thurstone, 


L Vi (=F) would denote the regression coefficients for determining S from P; 
i L would denote the regression coefficients for determining S from 


+I have substituted V and wv for the Greek letters A and \ that are almost 
universally used for the ‘latent’ or (as they were once called) ‘lambdaic roots’ in 
matrix algebra. The former are the natural symbols for variances in statistical 
notation: the latter present minor difficulties in typing and in teaching. 

ti.e., all its principal minors are greater than or equal to zero. 
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Rl=vl. (6) 


Here R is a given square matrix, defined as above, and | a column vec- 
tor of n elements, as yet undetermined. We obtain, as the condition 
of a non-trivial solution, n values for v; and, on substituting in equa- 
tion (6) each of these values in turn, we may solve the n homogeneous 
equations for the ratios 


Big seg t os oly 
If we add the condition that 
Py + Py +e +P,; =1, 
the resulting values for ),; may be regarded as direction cosines speci- 
fying the direction of the vector 1; associated with tie root v;. Now, 


since R is symmetric (i.e., R = R’), the roots v must be real, and the 
vectors | mutually orthogonal: for 


Ri, = v1, 
and 
Rl; =U; 1; . 
hence, 
l; Vv; 1; == l; RI}; = l; R’ L = (R 1)’ 1, == (vi 1,)' 1 = I; Vv’; iF > 


that is, 
(v; — v'i)T; 1, =0. 

Accordingly, if i = 7, vj; = v’;: from which it follows that v; and 
its conjugate v’; (both being scalars) are real and positive.* Again 
ifi+~7,1V,1, = O (unless the two latent roots happen to be equal). 
We thus have 

2 i lj ~; rai (7) 

Since the values v; are obtained by solving what is known as the 
équation caractéristique of R, they are called its ‘characteristic’ or 
‘latent roots’, and in quantum theory its Eigenwertent+ (‘characteris- 
tic’ or ‘proper’ values) ; here, as we have seen, they are to be identi- 
fied with the factor-variances. The set of values ],; , corresponding to 
a particular latent root v;, is termed the ‘characteristic’ or ‘latent 
vector’, and in quantum theory the Eigenvektor (the ‘principal’ or 


* The meaning of v’; will be clearer, if we remove the relevant restrictions. 


The conclusions in the text only follow as these restrictions are reintroduced. 

+ There is a tendency among recent German writers (10, pp. 15, 23) to desig- 
nate the latent roots ‘charakterische Zahlen’ and to keep the term ‘Eigenwerte’ 
for the reciprocals. (For the relations between the two, see 2, pp. 78-79). 








156 PSYCHOMETRIKA 


‘unit proper’ vector): here, as we have seen, such sets may be iden- 
tified with the direction cosines for rotating the test-axes into coin- 
cidence with the factor-axes, and their elements are therefore pro- 
portional to the corresponding saturation and regression coefficients 
contained in the factor matrix F. 

Once we have made these identifications, the problem of factor 
analysis in psychology is seen to be precisely analogous to that of re- 
ducing a symmetrical matrix to its canonical form in matrix algebra 
and of determining the ‘eigen values’ and the ‘eigen states’ of a real ob- 
servable in quantum theory.* Incidentally, it may be noted that the re- 
sults thus reached are independent of all reference to the mode in 
which the frequencies of the variables are distributed: no assumption 
of normality has been necessary. Thus, when the algebraic results are 
expressed in geometrical terms, the ellipsoids obtained by treating the 
measurements as samples of continuous variables may be regarded 
simply as ‘strain ellipsoids’, indicating the relations that arise from 
changing the scale of measurement differently in different directions, 
and not as the frequency ellipsoids arising from a normal] distribu- 
tion. At the same time, if problems of probability arise (as where 
the errors of prediction, or the sum of their squares, are to be mini- 
mized), the frequency-interpretation is still available, and the method 
at once gives an appropriate answer. 

Let us now write H; for the matrix of rank one obtained by mul- 
tiplying the single column vector f; by its transpose, and similarly let 
let us write E; for the matrix obtained by multiplying the latent col- 
umn vector I; by its transpose: thus 


E; =I, I; 


H; = £,€; 
and 
H; = VU; E; e 
Then the final expansion of R may be expressed 
R= H,+Hst--+H (8) 
= V7,E,+ v,H,+---+v,.E,. (9) 


Here the H’s represent a series of ‘hierarchies’ (in Spearman’s sense) ; 


*In quantum mechanics, complex numbers are used, and the matrices are 
Hermitian, whereas in factor-analysis the elements in the matrices are all real 
numbers (though in my view many problems might receive a more general solu- 
tion if complex numbers were introduced). Nevertheless, since the field of real 
numbers may be legitimately treated as part of the field of complex numbers, the 
real axisymmetric matrices, formed by the correlation and covariance tables of 
psychology, may be regarded simply as special cases of Hermitian matrices, in 
which the imaginary part of the complex number vanishes. 











CYRIL BURT 157 


and, in view of the properties that I shall presently demonstrate, the 
E’s may be termed ‘latent’ or ‘unit hierarchies’. The final expression 
given by equation (9) may be termed the canonical expansion of the 
correlation matrix. I thus conceive the fundamental problem of fac- 
tor-analysis to consist in expanding FR in terms of its latent roots and 
latent hierarchies in accordance with this equation. The reduction 
more usually made — to ‘manifest hierarchies’, as they might be 
called — in accordance with equations (4) and (8), seems only an 
incomplete performance of the essential task. 


III. The Reduced Hierarchy and its Properties. 


These ‘unit’ or ‘latent hierarchies’ have a number of peculiar 
properties; and these properties in turn will not only reveal more 
explicitly the striking analogies to which I have alluded, but also fa- 
cilitate the deduction of useful formulae both for practical and the- 
oretical work. 

(i). By definition each matrix E is constructed by post-multiply- 
ing the column vector I by its transpose. It is therefore (like R and 
H) symmetric: i.e., 


BE’ = (lr)’=W=E£. (10) 


(ii). Since the vector | is by hypothesis a matrix of one column 
only and therefore of rank one, it follows that E (like H) is a matrix 
of rank one; all its rows are linearly dependent, and the whole of the 
n* elements can be deduced from a set of » values only. In explicit 
notation, since 


en =Uh, (11) 


(4, 9, h, k, = 1, 2,---, 2) 


E therefore obeys the tetrad difference criterion; is strictly hierarchi- 
cal; and, being singular can possess no inverse. 

(iii). It follows, too, that FE cannot be expressed as the sum of 
two other hierarchies (say F = f f’ and G = g g’) unless both these 
hierarchies are themselves multiples of EF : for, consider any tetrad 
such as 


Pitos shifet+oi92] _ i | 
Yitacnder + 9’. ; 
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Write 
fe 92 
—_ = ’ d — = ° 
ts P 9 q 
Then 
PotD.  . GPa, 
PrPitag, pPri+aqg, ’ 
i.e., 
(p?+ 9°) 7192: =2pqf%197:. 
Hence 
p=+q;, 
and 


Similar relations hold for all the other elements in the vectors f and 
g. Equation (11) thus implies that the matrices E; are ultimate or 
‘pure’; i.e., they cannot themselves be further decomposed, as R has 
been decomposed into the sum of the E’s. 

(iv). From (11) we have 


Ten =hT1- (14) 
Thus the sums of the columns (or of the rows) of any given matrix 


E are (like the elements in each column or row itself) proportional 
to the elements of the latent vector from which E has been con- 


structed. 
The total of the elements in E is therefore 


22 ein = (2 l;)?. (15) 
And summing these totals for all the n E-matrices we have 
= Lis)? + = tig)* +-~++ + ( lin)? 
= SP FBP ype FS Pay (16) 
= *. 


In matrix notation the result may be reached more simply as follows: 


£, yt. 48, 
=I, V,+L+---+hl, 
= LL’ 


=I, (17) 
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These results (equations 15, 16, 17) will incidentally provide a useful 
means of checking the arithmetical calculations. 
(v). Since L’L also equals J, we have 


JP = VP = + = DPin = 1 ; (18) 


ie., the sum of the diagonals in any matrix E; (its ‘trace’) is equal 
to unity. These properties may be conveniently summed up by de- 
scribing EF as a ‘unit hierarchy’. 

(vi). A still more peculiar property is that 


B?;= (lj) (4'j)) = bij=&;, 
and generally 
",=E; ; (19) 
that is, each latent hierarchy is ‘idempotent’. 
(vii). Again 
BE, EE; = (Ni) Vj) =0, (tA J); (20) 


that is, any two latent hierarchies derived from the same expansion 
are mutually orthogonal. 

(viii). The latent hierarchies, however, may be derived without 
reference to the latent vectors: for, on examining the form of the 
characteristic matrix, it is easily seen that E, (say) 


_ (R—2, 1) (R— 231) «+» (R — vn 1) 
3, (V1 — V2) (Vy — V3) +++ (Vi — Un) 


and similarly for all the matrices E; . 

(ix). Finally, it may be of interest to consider how the variance 
for a given factor changes with changes in the several correlations. 
From equation (5) we have V = L’ RL ; hence each v; can be ob- 
tained as a sum of terms such as 2 li; Ley Tix» By partial differen- 


, (21) 





tiation for each particular correlation, 7;,, we have 


OV; 
0 Vix 





= Li; hej : (22) 


Thus E; proves to be the result of a matrix differential operator 


whose ik** (and ki'") element is 





r) 
OTK 
Equations (21) and (22) show that what we have been calling 
the latent hierarchies of the matrix R are simply the numerators of 
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its ‘partial resolvents’.* Equations (19) and (20) further indicate 
that each E; has all the essential properties of what, in the theory of 
groups, is termed a ‘selective operator’, or (when the groups can be 
represented by linear transformations) a ‘projective operator’, i.e., 
an operator which projects any given vector on to the unit vector |; 
defined by the corresponding latent root v; .+ A set of selective opera- 
tors satisfying equation (17) as well as (19) and (20) is called a ‘spec- 
tral set’; ‘it analyzes a mixed aggregate into pure constituents as light 
is analyzed by a prism or grating into a spectrum of pure colours’.t 
Including (13) we may say that between them the four equations se- 
cure that the set of operators which represent the several factors 
shall be (a) exhaustive, (b) non-overlapping, (c) pure (i.e., inde- 
composable), and (d) idempotent (i.e., such as to yield the same re- 
sult no matter how many times the process of ‘selection’ is applied 


* The ‘resolvent’ of R is the reciprocal of \ I — A, where \ may have any 
value other than the latent roots: (cf. Cullis, 11, pp. 316 et seg., Turnbull & Ait- 
ken, 12, pp. 160, 163, 184). 

+ For the formal definition of a projective operator see Weyl, (13, p. 23), 
or Neumann, (14, pp. 39-41) and Satz, (12), and, for a fuller and more technical 
discussion, Stone, (15, chapter IV, on ‘Resolvents, Spectra, and Reducibility’). 
I know of no discussion bringing together all the properties enumerated above. 

t The name has arisen from the fact that in the realm of atomic physics the 
most simple and direct experimental illustrations of the ‘laws of measurement’ — 
laws expressed by theorems very similar to those set out above — is to be found 
in the spectral analysis of molecular rays along the lines of the classical experi- 
ments of Stern and Gerlach (13). In quantum theory a ‘complete observation’ of 
a given mixed or inhomogeneous aggregate or assembly is regarded as a species 
of spectral analysis in which the given aggregate is resolved into a number of 
constituent parts which are relatively pure or homogeneous with respect to some 
particular variable. At first sight, the physicist’s problem as thus stated seems 
closely analogous to that of observing and measuring sample assemblies of men- 
tal traits in sample assemblies of the population with a view to discovering the 
distribution of relatively pure factors; and the more recent extension of the 
spectral theory to continuous as distinct from discrete spectra is reminiscent of 
the problems that arise in considering the continuity or discontinuity of the 
fundamental mental traits. There are, of course, differences of aim as well as 
resemblances (e.g., in mental testing we are generally more interested in meas- 
uring the individual than the assembly) ; but into these it is impossible to enter 
here. 

I may, however, point out that the analogy goes deeper still. Of late a good 
deal of interest has been aroused by the ‘phenomena of constancy’ in all pro- 
cesses of perception (cf. Katz, 17). Now, if we recognize that, in psychology as 
in modern physics, an ‘observation’ is to be specified not by a quantity but by a 
structure, not by the correspondence between a single sensation and a single 
stimulus, but by the correspondence between patterns or matrices, then we may 
regard the ‘real object’ as specified by the canonical form of that matrix (con- 
taining the Eigenwerte) which is implicitly thought of as providing a constant 
and fundamental standard — as in fact a kind of Platonic edos, As I walk round 
my study table, I am aware, not of the varying two-dimensional versions of it 
that reach my retina in changing perspective, but a kind of standardized four- 
square table in three-dimensional space. Now if, instead of the isolated stimuli, 
the entire matrix of stimuli is treated as the unit, the transformations of the 
pattern are quite as simple to examine as the transformations of the isolated 
stimulus, and the inversion of such transformations simpler still. It may be 
added that the postulate that introspection shall not distort the state observed is 
expressed by the requirement that in such cases the observation shall be repre- 
sented by an idempotent operator. 
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and reapplied). Or, to adopt the convenient language of geometry, 
we may say that, in factor analysis as in quantum theory, the intro- 
duction of these projective operators enables us to resolve “the total 
representational n-dimensional space” (i.e., what in correlational 
work has been called the “common factor space” (Thurstone, 7, p. 
69)) into n “characteristic linear subspaces”, each representing an 
independent factor.* The success which has attended this method in 
recent developments of physics is, I think, a guarantee of its value in 
solving many corresponding problems in psychology. 


IV. Corollaries for Practical Work. 


Apart from the theoretical suggestiveness of this result, the prac- 
tical properties just enumerated yield several important results that 
directly bear ox the practical task of arithmetical evaluation. To de- 
termine v; and H;, the direct method, as we have seen, would be to 
solve the characteristic equation for the v; and then substitute the val- 
ues for the v; in turn and solve for the 1;. If, however, the number of 
traits or tests is large, the labour of computing the determinants is 
prohibitive. It is for this reason, I take it, that so many different 
‘methods of factor analysis’ have been proposed: these claim to give, 
with relatively little trouble, not an exact, but a sufficiently approxi- 
mate result. Now I have elsewhere (2) endeavoured to show that, by 
expanding the higher powers of F# in terms of its latent hierarchies, 


* Cf. Weyl, (11, p. 22). This mode of formulation enables us to express many 
other subsidiary problems that arise in psychological analysis in a shape iden- 
tical with that of problems that have already been solved in the field of quan- 
tum-mechanics. One important problem is to ascertain whether the same set of 
factors is common to different sets of tests (with the same tested persons) or 
to different samples of the population (with the same set of tests). If S,, S,, 


R, and R, represent the scores and correlations from the two sets, then, when the 


factors (and therefore their direction cosines L) are the same, we have at once 
, , = R, R,, and (since the factor measurements in P, and P, will tend to 


be uncorrelated) S, S,’ = S, S,’. This ‘symmetry criterion’ has already been 


applied with success by Miss Williams and Miss Davies in the researches on per- 
sonal types. Again, another problem is to ascertain the minimum number of 
independent factors requisite to account for S and R within the limits imposed 
by the probable error: this also receives a simple solution (see below). More 
particularly, in quantum theory R is said to be ‘completely reducible’ if only 
two sub-spaces are required: this situation is analogous to that postulated by 
the bifactor hypothesis in psychology; and the conditions deduced in physical 
theory can be carried over to psychological theory. Again, to allow for ‘chance’ 
factors and for the ‘errors’ of observation or measurement, space of infinite di- 
mensions will in the end be required: but it has been shown in quantum theory 
that, in general, theorems proved for finite matrices may be assumed true for in- 
finite, and the recent extensions of the theory have revealed both the justifications 
and the limitations of such an assumption (cf. Hilbert and Courant, 10, pp. 128- 
9). Once more, the treatment of issues involving the determination of probabil- 
ities (‘laws of transition’ and ‘exchange relations’) could be adapted with but 
little modification. 
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one may obtain, not merely a reconciliation of the rival methods so 
far suggested, but also a simple modification which will rapidly yield 
a result of any desired exactitude. 

If we begin by squaring the covariance matrix R, we have 


R? = (v, E,+ v, E,+---+ v, E,)? 
= v?, E?,+ v*, E*2,+.---+ v2, E, 
+2, v, BE, E,4 2, v3 BE, Bs + +++ 2 Un Un Ena En 
= y*, FE, + vw, F,+-+.+ v3, Bs (23) 


since by equation (19) E?, = E,,---, E?, = E,, and by equation (20) 
E, E, = E, E,; = --- = E,., E, = 0. On continuing the self-multipli- 
cation of the matrix, we shall evidently obtain for any* power of R: 


R" = v™, E, + uv", E,+---+ 0", E,. (24) 
Dividing through by v™, we have 
Rv/v™, = E, + 0™,/v™", By + +--+ 0/0", Ee ; 


and, since Vv; > V2 >---> Un, we can, by taking m large enough, ob- 
tain 
R”/v™, = E, (25) 


to as close an approximation as we may desire. Similarly, by taking 
(m — 1) large enough, we can obtain 


R*/v,." = v,E,=H, ’ 
that is, 
R* =v, H,. (26) 


It is thus evident that, with a sufficient number of self-multiplications, 
any symmetrical matrix, such as a table of correlations or covari- 
ances, can be reduced as closely as we wish to a matrix of rank one, 
i.e., to a Spearman hierarchy. To factorize such a matrix is now ex- 
ceedingly simple: for, since its rank is only one, it contains but one 
common factor. 

To determine the saturation coefficients which form the vector bL, 
we may apply equation (14). Their proportionate values can thus be 
obtained by simply adding the columns of the product matrix R™ ; and 
on normalizing the proportionate values we reach l, . 

Since R = S S’ (where S is the matrix of scores), the elements 


* This includes negative indices: so that we have here a rapid means of 
calculating the inverse of R, which is often wanted for the regression equations. 
Note also, that, since 
RRA—E,+E,+.--+E£,,andRR1A=I1, 


we have a simplified proof of equation (17). 
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of R may be termed ‘product-moments of the first order’; those of R™ 
may accordingly be termed ‘product-moments of the m“ order’. When 
V2, *** » Un are absolutely as well as relatively negligible, i.ec., when R 
itself is virtually hierarchical, the summation formula may be applied 
to R as it stands:* otherwise it must be applied to ‘higher product 
moments’ instead of to the observed correlations: i.e., instead of 


f’, R = c, f', (‘simple summation’) (28) 


we may take 
f’, k™ = Cm Pn = Cm Foo (29) 


where f; denotes the j'" approximation to the saturation coefficients 
for the first factor; f, denotes the summation operator {1, 1, --- , 1} ; 
f,. denotes the true value of the saturation coefficients, i. e., the vector 
{119 ,°** » Tng} » and can be determined as exactly as we desire by tak- 
ing m sufficiently large; and c; denotes a constant that can be explic- 
itly determined if required. 

Now the summation of a power of R (R*, say) may be expressed 
f,(R X RX RXR). But this is clearly equal tof) R(R & RX R). 
It follows that f’, R” may be computed in two ways — either by table- 
by-table multiplication or by column-by-table multiplication. Thus, 
we may begin by multiplying the matrix by itself, repeat the multi- 
plications again and again to the m‘ factor, and then add the columns 
of the final product; or we may begin by adding the columns of the 
initial table, then use these sums to weigh‘ the coefficients in each 
row, and add the columns again, repeating the additions again and 
again to the m'" sum. Incidentally, since w(R kK R &K RX R) = 
w,(R?)2, we may conveniently abridge the first method by taking m 
to be a power of 2, say 2” ; for, if we square the product of each squar- 
ing, we shall make p table-by-table multiplications furnish the same 
result as m. 


*In an early paper (18, 1917) I showed that, if FR is hierarchical, the satura- 
tion coefficients may readily be determined by the ‘simple summation formula’ 


= -/ VEZ i 
Tig Pris! V3 j Vij (27) 


But I have always maintained that, so far as R is not hierarchical, this formula 
could be regarded only as a first approximation. Thurstone, however, (7, p. 94, 
equation (13)), has more recently taken the formula as the basis of his centroid 
method, and treated it as applicable to any form of correlation table. I myself 
have always maintained that, so far as R is not hierarchical, the formula could 
be regarded as furnishing a first approximation only; and in another article I 
have endeavoured to demonstrate at length, and with arithmetical illustrations, 
that the saturation coefficients or ‘factor loadings’ obtained by the centroid meth- 
od are, in effect, simply the first of a converging sequence of values of which 
the limits are the values given by the direct solution of the characteristic equa- 
tion. 
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We have, therefore, a choice between a large number of short 
multiplications and a small number of long multiplications. As a 
rough rule, it may be said that table-by-table multiplication will prove 
more economical when the table is small, and may also be of service 
in the reduction of highly irregular tables: whereas table-by-column 
multiplication will prove quicker when the table is large, and may 
also be of service in the final stage of the whole calculation. Both 
these methods rest on the same principles as the procedure proposed 
by Hotelling; and variants of them are familiar to computers in other 
mathematical fields. 


Hotelling’s original iterative method (19) involves a series of table-by-col- 
umn multiplications; and more recently (20) he has suggested the matrix-squar- 
ing device as a useful preliminary to iterations with the table-by-column method. 
As my students and I have applied them, however, the principles described above 
entail several minor but important differences, which (as it appears to us) at 
once diminish the labour and enhance the accuracy of the final results. Hotelling’s 
instructions are as follows: “The process should be started with trial values of 
one digit each .. . as judged by inspection. Each digit should be accurately deter- 
mined: by repetition until stationary values are reached before the calculations 
are carried to another place” (19, pp. 431-2; cf. 20, p. 33). With the table-by- 
column method, the procedure* required by the foregoing proof is as follows: (i) 
instead of more or less arbitrary trial-values, judged by inspection, we commence 
always with the figures obtained by simple summation, and use these as the first 
set of multipliers or weights; (ii) at every stage we treat the sums not as sug- 
gesting new ‘trial values’ to be chosen by further comparison or an intelligent 
guess, but as precise weights, working with the full number of significant digits 
from the start: this renders the procedure perfectly mechanical at every step, 
and does not throw upon the inexperienced calculator the task of judging what 
corrections he shall try in the hope of jumping at once to a close approximation; 
(iii) when the full figures are retained, then, as we keep multiplying, the way in 
which the product sums are approximating to the final values becomes obvious 
and it is seen that the differences are apparently diminishing according to a regu- 
lar rule: hence, after three or four iterations have been calculated explicitly, the 
results of the subsequent iterations may be obtained by the method of differences; 
and, in most cases, the final values may be obtained forthwith by extrapolation. 

So far as theory is concerned, instead of regarding the table-by-table method 
as a modification of or a preliminary to the table-by-column method, I regard 
the principle of self-multiplication of the entire table as fundamental. In either 
case, however, the point to stress is that we are not dealing merely with alterna- 
tive modes of approximation, but with alternative procedures that lead to identi- 
cal results. This has been so often overlooked or questioned} that I may be par- 
doned for giving a concrete example. I take the table used to illustrate the two 
procedures in my previous article, namely, that constructed by Kelley to compare 
his own method with that of Thurstone. 


yy ~. ‘ s »y 
vi; r; Tir Ti; Tie VioTig = Tig, Vig Ti; 


1 
1.96 1.00 1.9600 .70 1.3720 26 .5096 
1909 .70 1.88300 .75 14250 .45 8550 
106 26 .2756 45 4770 35 .3710 





=r,;. and 
— 1.96 3.5656 1.90 3.2740 1.06 1.7356 
oy os 

* For illustrations cf. (2, p. 91); (3, pp. 185-8, Tables IV and V). 

+ An instance in which this principle has apparently been overlooked is tha use of the simple 
summation formula (the so-called centroid formula, described above, equation 27) to determine the 
saturation coefficients. Here the sum of the correlations for each test must be the same as the 
correlations for each test with the sum, Hence, as we have seen, the saturation coefficients thus 
computed simply treat the general factor as identical with the simple sum or average of the test 


scores (1, p. 287, footnote 1). 
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The totals > =r;; r;; form the vector (f, R) X R = f, R?. If now we square 
R first of all and then add, we obtain 


2r,; i; =r, ; Ti; 273; Ti; 

1.5576 1.3420 .6660 

1.3420 1.2550 .6770 
-6660 .6770 .3926 





ZzIr,,7; 3.5656 8.2740 1.7356 


4 "aj 
Here the totals = =7;; r;; form the vector f,(R X R) = f, R%. It will be noted 
that no decimal figures have been omitted, and the results are exact. 


When f has been thus determined as accurately as is required, we 
shall have fn. = f, to the number of digits retained, and 


Fi R= oe, €,,. 
To evaluate v, , two methods are available. 


(i) Since, within the limits of our approximation, R” = v™, E,, and 
since by equation (18) the sum of the diagonals in EF, = 1, it follows 


that the sum of the diagonals in R™ = v™,. (30) 
(ii) Since, too, R™ = v," E, 
we have 
v= f', R™ é. 
a FRE, - 


_ Total of all coefficients in R” (i.e., of weighted sums) (31) 
Total of all coefficients in R™ (i.e., of weights) 





To keep ¢ constant, and so render the progress of the approximations 
visible to the eye, we may at any stage reduce consecutive sets of 
product-sums to the same order by dividing them by the ratio given 
above (31): or, better still, we may from the very start express each 
set of weights as fractions of unity in the usual way; in this case, 
each set of product-sums will be first divided by its own total before 
being used to re-weight the original correlation matrix.* 

How to determine the constant c and to ascertain the amount of 
error involved at any stage of the approximation will become obvious 
if we set down the entire operation in matrix notation, and apply the 
expansion given above (equation 24). After the m‘* summation, the 
vector of weights — i.e., of product-sums when reduced to fractions 


*I need not give further illustrative tables: in my review (3) of Kelley’s 
book I have taken his own set of correlations by way of example; equation 30 is 
exemplified by the table on p. 185, and equation 31 by the table on p. 188. 
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of their own total — will be 


«cial 
eee Se. 
ba f',(v", E, + v", E,+--- + v™, En) 
fo (v™, By + uU™, be --- + U™, E,) fo 


faa u™, t, + v™ & + vee Un tr 
v™, rs a vu. a: + PORE, = v™,, fe 


where t; (a vector) is used to denote the sums of the columns in E;, 
and 7; (a scalar) to denote the grand total of these sums. On divid- 
ing out this yields (retaining only the largest terms) 


t, v™, (t7,—tT,) v™, (7; —t.T,) 

















T?, om, f, 
v2"T, (t7T,—tT,) 
a mates 3 ro; 
The first term is the vector 7r1,/D>1%j,, 12,/D>"j, --- ; and c therefore 


has now become 1/>7;,. The second and third terms in (32) indicate 
the chief sources of error, and therefore the approximate rate at 
which the inaccuracy is diminishing. If v™./v™, is small, v™,/v™, will 
usually be far smaller, and v,2"/v,°" wholly negligible. Generally, 
therefore, the amount of error incurred by taking R” in place of R* 
will be nearly proportional to v™./v™, , and thus virtually diminish in 
geometrical progression. This is the justification for estimating the 
later terms of the series f, R” (as suggested above) by extrapolation 
instead of by continued iteration.* Having obtained figures as exact 
as are desired, the simplest method of eliminating c is to normalize 
the results, thus obtaining the direction cosines or ‘regression co- 
efficients’. The saturation coefficients can then be obtained by mul- 
tiplying the normalized figures by v'. 

* The obvious device is to use the well known formula for the sum of an 
infinite diminishing geometrical progression: but with a machine continuous 
multiplication by the constant ratio is almost as quick. It may be noted that the 
‘common ratio’ obtained in extrapolating for the first factor enables us to esti- 
mate at once the size of the variance for the second factor. Hence, without fur- 
ther calculations, we may decide whether the variance contributed by this second 
factor is also large enough to be statistically significant. Working instructions, 
with examples, have been drawn up for students who are unable to follow the 
algebraic demonstration but wish to apply the methods here described, and have 
been in use at the laboratory: they are too lengthy to print in an article, but may 
be obtained from ithe Psychological Laboratory, University College, London. De- 
war has compared the results of what we call the “G. P.” (geometrical progres- 


sion) method with those of other procedures in a paper to the British Psycholog- 
ical Society (March, 1937) which will no doubt shortly be published. 








~_ ss a ne 
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After the saturation coefficients for the first factor — f, and 
therefore H, — have been thus determined, the residuals (R — H,) 
can be calculated in the usual way; and, if significant, the saturation 
coefficients and hierarchy for the next factor may be computed if re- 
quired. Similarly for the rest of the n factors, until all the significant 
terms of the expansion (9) have been evaluated. 

To decide whether a single general factor is alone sufficient to 
account for the observed correlations, i.e., whether R = H, within the 
limits of the sampling errors, there is no need to calculate all the sec- 
ond order minors (‘tetrad differences’) and their probable errors. We 
have only to inquire whether the residual variance v; — Vv, = V2 + 
--- + v, is statistically significant or not. I may add that, if we con- 
struct an artificial matrix of correlations from a given set of indepen- 
dent factors, the foregoing method of analysis appears to be the only 
one which will lead back to the factor-variances and saturation co- 
efficients as originally given.* 
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CONTRIBUTION TO THE MATHEMATICAL BIOPHYSICS 
OF ERROR ELIMINATION* 


The theory of error elimination developed by N. Rashevsky is 
extended and generalized. Theoretical conclusions obtained are com- 
pared with the experimental data available. 


HERBERT D. LANDAHL 
The University of Chicago 


In a paper by N. Rashevsky (1), a theory of delayed conditioned 
reflexes is developed and applied to the theory of error elimination. 
It is the purpose here to revise and extend the treatment of the prob- 
lem without changing the physical assumptions regarding the process 
of conditioning. A somewhat more general discussion of the problem 
of maze learning will then be outlined qualitatively and the results 
compared with available experimental data. Several experiments are 
suggested since they follow directly from a consideration of the prob- 
lem. 

The following equations are developed by N. Rashevsky (1): 


R=F(1—e*) I, (1) 
I(x) = I, e# , (2) 
I(x, n) =I, e#* —b [“R(, n)dx , (3) 


where R is the strength of the conditioned response, n the number of 
trials, I the intensity of the stimulus, x the abscissa of a particular 
nerve center; a, 6 and F are parameters depending on the nerve con- 
stants, and b is the proportionality constant for inhibition. The fol- 
lowing definitions are also given: the time t, the velocity of wave pro- 
pagation v, and the time between the stimulus and secondary re- 
sponse, or the delay time for conditioning t. For t = 0, x = x, and 
forté=1,2=0. 

Let us consider a slightly modified form of the behaviour pattern 
discussed by N. Rashevsky. Consider the pattern 


0 To T 


* This investigation has been made possible by a grant from the Rockefeller 
Foundation to the University of Chicago. 
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The stimulus S, , occurring at t = 0, produces the unconditioned re- 
sponse F#, , which, after a time 7, results in S, , a stimulus which pro- 
duces R, unconditionally. Neglect the reaction time between S, and 
R,. Let R, be a response which is the negative of R,. Also let the 
physical situation be such that if the response FR, continues past a 
particular part, R, , which occurs at the time 1, then the response R, 
is unavoidable. 

More specifically, let us assume that S, represents the “sight” of 
an alley entrance, R, is the act of starting forward to enter, R, is the 
act of entering, and R, is the act of turning back due to the alley end. 
t) is then the time between the stimulus S, and the entering of the 
alley. If a is the average velocity of travel in an alley, we have 


l 
~~ a 
(4) 
vl 
Lp = VW —- 
t > 


In equation (3), we have taken the upper limit of the integral to 
be x, , thereby assuming that the centers near x, are not inhibited by 
the time elimination of the error occurs, or that the range of inhibi- 
tion is small. It is evident from the form of the wave equation that 
the effect of conditioning near x, is small compared to the total condi- 
tioned response over the range from z, tox. 

Substituting the value of R(x,n) from equation (1) in (3), we 
have: 


len) leh ~b [ra—e™) I(x,n) dz , (5) 
0 
for which we have the solution (2): 


F b(li—e™) (1 =aF | 


BLL-+F 6(1—e™) x] (6) 





I(2zn) =f, jew — 

Now since the intensity is taken as sufficiently strong to elicit all the 

response conditioned, we have for the total response at x, after n 
trials 

R,(2,n) = f "R(2,n) dx, (7) 


Substituting from equation (6) in (1) and then in (7), we have: 





~ 


— ae 2. e: FS 
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FI, 
B 


x patie e-fro — 


Rr (2,n) — (1 —e") 








F b(1— e) (1 — e-F) (a, — x) (8) 
1+ F b(1—e™) x | 
Let n, be the number of trials for which the conditioned response 
at the moment 7, is of itself, without the secondary stimulus, sufficient 
to produce the response R,, or Rr(0,n,) = R,. Equation (8) then 
becomes: 


R, (9) 








FI, one 1 — eFro 
is a alia ora 5 | 


Solving for n, , we have 


t= 5 te E R, 6 (10) 
seat... ee ee serra 


Equations (10) and (4) then give the function 1, (/), the relation be- 
tween the number of trials for elimination of the alley end and the 
length of the alley. 

If we let 





Pp (Xo) = I,(1 — ee) R, b B Xo (11) 





we must have @maz > R, 8/F for positive, real values of n,. For ¢ to 


be positive we must have b < I,/R,. The lower and upper limits, 1, 
and /, of the alley length 1, are given by the roots of the equation 
y = R,8/F where we substitute the value for x, from equation (4). 
Now the value of b must be taken very small in order that the range 
from l, to I, be large and also in order that the relation ”,(x,) satisfy 
more general situations which require that the minimum should occur 
for values of x, which are small compared with the whole range. Then 
taking b to be very small, g increases rapidly to its maximum value 
and thereafter decreases slowly. In this case gmaz occurs for a value 
of / near 1, . Since the function n, (1) increases monotonically with the 
reciprocal of g , the graph of the function may be represented as in 
Figure 1, and 7, increases with I except possibly for small values of 1. 

The value of 1, , the lower limit of 1, may be either negative or 
positive. If 1, is negative, a finite number of trials is required to 
eliminate an alley of infinitesimal length. But if 1, is positive, the 
alley is not eliminated unless its length exceeds /,;. We might require 
that the parameters satisfy the condition that l, is negative, but this 
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FIGURE 1 


is not essential since a sufficiently short alley would never be entered. 
The upper and lower limits do, however, give some restriction which 
may assist in the evaluation of the parameters. 

Since the upper limit of the integral of equation (3) was taken 
to be x, , R(x,n) is always positive, so that R;(x,n), in equation (7), 
increases monotonically as x varies from x, to zero. Hence the re- 
sponse at (7, — v 1%), the abscissa corresponding to the alley entrance, 
is less than the response at any point between (x, — V1) and zero. 
There is then no possibility of the elimination of the entrance before 
the end of the alley. Also, the entrance of the alley will not be elimi- 
nated, if the end cannot be eliminated, that is, when / does not lie be- 
tween /, and ,. 

Let us now consider the response for trials after m,. As the con- 
ditioning proceeds, the total response must equal R, at some center x 
greater than zero. Evidently x depends on m, the number of trials 
after n, and determines the position at which the animal turns about 
in the maze. Integrating R(x,n,) with respect to x from x to 2%, we 
obtain R,(xz,n,), the total response due to the conditioning from the 
first n, trials, when the wave front has reached the center x. Then 


FI, 
B 


< | eB? — ebro — 





Rr(xz,n,) = (1—e-™) 


F b(1— e-™) (1 — e-*) (4, — zx) 
1+ Fb(1—e™) a, |: (12) 





Substituting the value of n, from equation (10) into (12) we 
have: 
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(13) 





a * I,(1— e**) — Rb Bx 
Rr (@a,=R,|} o( ef) —R a 


~ Iy(1— e*) —R, bP x, 
With each trial after n, , the conditioning proceeds with a shorter 


delay time z’ = (2%, — )/v. Then in place of equation (2), we have 
for the intensity of the stimulus at x , when the wave front is at z, 


I (2,2) =I, e8*) , (14) 


Since the conditioning for each trial occurs at different positions 
of x, or correspondingly different times ¢, we cannot use equation 
(1). Let us introduce a more general equation. If R; is the response 
conditioned during the ith trial, then R; must decrease monotonically 
with successive trials. Let us then consider the equation 


R; = F(1— e+) e-*4-) (2). (15) 
Summing over 7, we have the total response 
R=2ZR;. (16) 
If I(z,x) is not a function of i, equation (16) with (15) reduces to 
(1), since 


(1— e+) 2 eat) = 1 em, (17) 


Substituting equation (14) into (15), and integrating according 
to (3), we have for the contribution to the conditioning at x by the 
ith trial, (2 > n,) 


R(x,x) i = F(1 — @-*) e-2(5-1) 


x{[t pitas of [Remax 43R; | (18) 


M141 


Now the expression in the braces is evidently the total conditioned 
response from all the trials and must equal R, because of the defini- 
tion of x. Then introducing R, into equation (18) and integrating 
with respect to x from x to 2), we have: 


Rr(z); = (1 —e*) e--» [,(1 — e843) — RB, b Ba —2Z)]. 
(19) 


Then summing over i from (n, + 1) to (m+ m) we obtain: 











174 PSYCHOMETRIKA 





= F nm 
Rz(2)m = ry (1— e*) 2, igen 
x [eo (1 — e8)) —R, b b(t) —2)], (20) 

where z is a function of 7. 
The sum of R(x), and Rr(x)» is evidently the total response 
at x due to the conditioning from all (, + m) trials, and is equal to 


R, from the definition of zx, or 
R, = R,(z),, + Rr(X) m- (21) 


If m = m for x(m’) = 2% — vw, and if v = n,-+ m’, then” 
is the number of trials necessary to eliminate the alley entrance. In 
order to use equation (20), let us assume that 7(7) in the summation 
has an average value which is some fraction (1 — n) of 2, or % — 
Z = nx. Using the foregoing with equations (13) and (20), we have 
for equation (21) 











ca 
I,(1 — e-6%) — RK, b B Xo 
F ' 
Or eI) er Od (22) 
Solving for (1 — e-”’) we have 
, R, B 
Y 1, (1 — eB(2o-¥70)) —__ R, B b(t — V 20) 
— eam ‘ 
[I, (1 — e-F°) — R, b B xo] (1, (1 — eF1%) — R, b Bn Xo] 
(23) 
Since n’ = n, + m’, we have, using equations (10) and (23), 
‘om 1, 
a oe 





1 R,B[I.(1—e 8 rrr) + I, (1—e-62%) — R,bf (%>—V15+-%>) ] 
=f F[I,(1—e6) — R, bfx] [I.(1—e-6"°) — R,bBnxo] 


(24) 
which with equation (4) gives n'(I). 
Now equation (24) may be written 
Lo—V 
(1—e>"’) ie (1—e-™™) 2 +e (25) 
yp (%o—x) 
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where ¢ is defined by equation (11). 
Using the first term of the expansion of the exponentials, we 
have for very smail n’s: 


w= iy) pp, CS) I 26 
nue ome ~— 


The following relations can be verified, as we have b very small, 
g>0,and %—vy > 2x >O0. For % < vt or % — 2X > % — Vt, 


p(%o—Vto) /p (4% —x) increases monotonically with x, and, except for 
small values of 2), the ratio is greater than unity. In this case n’(l) 
is similar to the curve in Figure 1, with the minimum displaced to the 
left. For 


L>V%w OT X—L<%—VH, 


y(%o—Vt0) /p(%.—x) decreases monotonically with 2), and except for 
small values of x) in a range larger than in the above case, the ratio 
is less than unity. Over the range for which the ratio is less than 
unity, 2’ increases with 1(2,) even though the ratio itself does not. In 
this case, then, the curve n’(l) is also similar to the curve in Figure 1, 
but has the minimum displaced to the right. If we take 7x ~ x, 
then the minimum of 7’ might be displaced to a point along x) which 
is twice as great. In either case, n’ increases with | except for small 
values, and approaches a limit. However, if we do not restrict b to 
small values, n’(l) may have more than one minimum, or decrease 
with / over a large range. 

It should perhaps be emphasized that the above discussion applies 
not only to the elimination of a dead end but to any response pattern 
in which an extraneous or disagreeable secondary response is the nega- 
tive of a primary response. In any case, the time 7 is the time between 
the primary stimulus and secondary stimulus, while t) represents the 
time after the primary stimulus during which the result of the pri- 
mary response can be avoided. In certain situations, the secondary 
stimulus can be avoided until the very instant that it is received. In 
the response sequence, given in a preceding paragraph, FR, coincides 
with S, in time, and 7, equals 2. If we replace x by vz in equation (10), 
we obtain (vz), the number of trials to completely eliminate the 
extraneous or disagreeable response. The function n(vz) is also rep- 
resented by the curve in Figure 1, with the origin at the point for 
which x = 0. 

Since the theoretical function is obtained, it would be of interest 
to check with the results from an experimental determination of 
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n(vz), especially if the relation is determined for very small values 
of 7. The existence of a lower limit for t might then be found. It is 
evidently necessary to prevent anticipation of the primary stimulus. 
The existence of an upper limit for 1 might be obscured by the forma- 
tion of chain associations so that it would be necessary in the experi- 
mental procedure to minimize their effect as much as possible. 

The parameters in equation (10) for n(vt) may be obtainable 
from experiment. One might expect the parameters to change but 
slightly for various response patterns involving the same receptor and 
motor organs. Should the theory prove to be sufficiently complete, it 
would be of interest to find a possible relation of these parameters to 
factors obtained by other methods. 

The simplified mechanism discussed here takes into account only 
one factor, that is, the mechanism of elimination of a wrong alley. 
In actual experiments, both wrong and correct alleys are present to- 
gether so that there is at least one other major factor. Therefore a 
strict comparison of theory and experiment cannot be made at this 
stage. But we can show that there is no disagreement if we take the 
second factor into account. Let us do this here in a qualitative manner 
and on the basis of preliminary investigations. A more complete 
mathematical discussion of the problem may be presented at some 
later time. 

The two factors may then be stated as follows: (1) Each blind 
alley tends to be eliminated owing to the association of the return 
with the entrance. It is most probable that the parameters are such 
that the shortest alleys would be soonest eliminated. It is this factor 
which has been discussed in the first section. (2) The goal response 
(food) becomes associated with each alley entrance, and this associa- 
tion is the stronger the shorter the time between the stimulus and 
response. Preliminary considerations show that this second factor re- 
quires (a) that the longer route to a goal response be eliminated, 
(b) that the alleys near the goal be eliminated more readily, (c) that 
the longer the alley, the sooner it be eliminated, and this holds es- 
pecially for alleys near the goal. 

Considering the two factors simultaneously, we would expect that 
the first alleys of a maze would be eliminated sooner the shorter they 
are, but that the order of elimination near the goal would even be 
inverted due to the increased importance of the second factor. The ra- 
tio of the number of trials for long alleys to the number of trials for 
short alleys, 7, would then be a fraction greater than unity for the first 
alleys of the maze, and would decrease with successive alleys to a 
value possibly less than unity near the goal. 
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In order to compare these conclusions with experimental data, 
let us assume that the number of trials, ”,, for elimination of the alley 
end, is proportional to the total number of “complete entrances” made 
by a group of animals throughout an experiment, provided the maze 
is quite well learned. Additional runs then only slightly change the 
totals. We use 7, here since equation (10) is simpler than equation 
(24). However, similar results are obtained if we use 7’. Since ran- 
dom stimuli are present, but are not considered in the discussion, one 
should not be disturbed by the fact that “complete” and “partial” en- 
trances seem to occur in an almost random order. 

In the experiment by Peterson (3), two mazes were used. In 
Maze B, two groups of rats were used. For one group the alleys 1, 3, 
5, 7, 8, 10 were shortened while for the other group the alleys 2, 4, 6, 
9 were shortened. In Maze A, with six blind alleys, one group of rats 
ran the maze with all alleys full length, while the other group ran 
the maze with all alleys shortened. In order to compare ratios of the 
number of entrances to long alleys to that of short alleys for Maze A, 
all values from one group of rats were multiplied by a constant so that 
the grand totals of all entrances for each group would be equal. For 
Maze B the procedure was the same. The ratios of “long to short” 
entrances for each alley are shown graphically for both the A and B 
mazes in Figure 2. The ratios are given with the alleys equally 























os JO | Fetersons Data 
S I: Maze A-6 Alleys 
BS A x: * aH * 
S29 — \ 
S 
NX 
Syok 
S 
ge 
Oo l l l | l | 1 | = 
/ nS | = Ss 6 7 6 F 
Alleys 
FIGURE 2 


spaced, though the intervals are neither linear with distance along 
the maze, nor with time. In Maze A, alley 1 is the only blind alley 
occurring alone. The general decrease in the ratio, which occurs for 
both mazes, is in accord with the arguments discussed above. 

In the experiments I and II by Tolman et al. (4), and the ex- 
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periment by White and Tolman (5), the rats were given the choice 
of a short blind, a long blind, and the correct path to the goal. Due 
to the proximity of the goal in these experiments, one would expect 
the second factor to predominate, so that because of. factor (2c), the 
ratio would be less than unity. The ratios obtained, using “complete 
entrances,” were 0.79, 1.14, and 0.47 respectively, giving a weighted 
average ratio of 0.82. The last group of alleys in the A and B mazes 
used by Peterson (5, 6 A and 8, 9, 10 B) give a weighted average 
ratio of 0.63. 

In the case of the non-hungry rats in experiments I and II (4), 
there is a definite inversion, and an explanation for it is set forth. It 
might be added here that the strength of the association with food 
would not be expected to be as strong for non-hungry animals so that 
the relative unimportance of the second factor (2c) might well have 


contributed to the inversion. 
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It would be of interest to use the set-up described by Tolman et 
al., in experiment I but with various lengths of the alley leading to 
food. With increased length of this path, the number of trials would 
be expected to increase, but there should also be an increase of the 
ratio of “long to short” elimination and even a possible inversion. 
For the length of the correct path equal to either the long or the short 
blind, certain peculiarities might be expected. The lengthening of the 
correct path might be attained by a detention cell, but here the cell 
night become a sufficient substitute for the goal response. Even in 
the case of a long alley this substitution might occur and obscure the 
results. 

The presence of two factors, in addition to an indefinite num- 
ber of small distracting factors, is already too complex. It would be 
desirable to eliminate one of the factors. The first factor may be mini- 
mized by the set-up shown in Figure 3, in which the animal does not 
have to retrace its steps. Im the first unit of such a maze, one would 
expect the early entrances to occur according to probability. The total 
number of trials to obtain a fair degree of learning would decrease 
with successive alleys as the goal is approached, and also the ratio of 
“long to short” entrances would decrease from approximately unity 
near the entrance to a minimum near the goal, according to factor 
(2c). 

In a maze of the type shown in Figure 3 the correct alley en- 
trance is either to the left or right; the middle path could be made 
correct by raising the passageway or by suitable gates. The last sec- 
tion of the maze, as shown, illustrates one method of making the center 
path correct, and could be used for investigating the effects on maze 
behaviour and learning of a path which completely encircles the goal 
(or entrance). 

We have presented a mathematical discussion of factor one, and 
have given a preliminary, qualitative discussion of factor two. From 
factor one, we expect more trials for the elimination of a longer blind 
alley. From factor two, we expect fewer trials for the elimination of 
a longer blind alley, and this holds especially near the goal. The com- 
bination of the factors gives qualitative conclusions which are born 
out by the negative slope of the points in the accompanying plots. 
There is also no disagreement in the case of other available data. 
Certain experimental approaches which are suggested by theoretical 
considerations of the problem are discussed. Indications are that the 
problem may be solved more readily by a somewhat more general ap- 
proach. An attempt will be made to obtain this solution in the near 
future. 
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THE ORTHOGONAL TRANSFORMATIONS OF A FACTORIAL 
MATRIX INTO ITSELF 


WALTER LEDERMAN 


Moray House, University of Edinburgh 


Certain matrix algebra, pertinent to multiple factor theory, is 
presented. 


I. Introductory 


If a team of n tests be resolved into r group factors and n speci- 
fics, the (complete) factorial matrix or matrix of loadings is an n <X 
(n+7) matrix of the form 


Liligesshy mM, 
N = Nae [Da bb Me |: (1) 
bis big *** bigs Mn 


where lq is the loading of the a‘ general factor in test i and m; is the 
loading of the specific in test 7. On the assumption that all the factors 
(including the specifics) are standardized and mutually uncorrelated, 
the correlation matrix R and the factorial matrix N are connected 
by the equation 

R=NN'. (2) 


When the tests are expressed in terms of some other set of 
(n+1) statistically independent iactors, the factorial matrix is 
changed to 


N=NB, (3) 


where B is an orthogonal matrix of order (n-+-7). The relation (2), 
however, is invariant under this transformation, i.e., we have 


R=NN’. 


In two recent publications* Professor Godfrey H. Thomson, when 
discussing the indeterminacy of the factors, has constructed matrices 
B which not only leave the relation (2) unaltered, but also preserve 
the matrix of loadings itself, thus 

*1. “The Definition and Measurement of g,” Journal of Educational Psy- 
chology, vol. 26, (1935), pp. 241-262. 


“Some Points of Mathematical Technique in the Factorial Analysis of 
Ability,” ibidem, vol. 27, (1986), pp. 37-54. 


—181— 
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N=NB. (4) 


The object of this paper is to find the general form of such a 
matrix, i.€., of a matrix B which has the following properties: 


(i) B is orthogonal; 
(ii) B satisfies equation (4) . 
Professor Godfrey Thomson’s solution is of the form* 
2qq 
B=[———. 
qq (5) 


where 
| ee {Q: 9 Qo5°**y Or 5 Ura °°'s Grin} 
is a column vector whose elements are defined as 


Gi=@=--=¢q=—l, 
(6) 


sum of general loadings in test 7 
specific loadings in test 7 





Orsi = 


While in the hierarchical case (r=1) the matrix given by (5) 
and (6) is the only solution (apart from the trivial solution B = /), 
we shall see that in the case of several group factors an infinity of 


7 (r—1) 


5 arbitrary parameters. 


solutions can be found depending on 


II. The General Solution 


It is convenient to write the factorial matrix in the form 








N=[L M], (7) 
where 
Lies b,| mM, 
paar ati om 
Ina ++ Une | ” Mn 


We shall always assume that M is non-singular, i.e., that none of 
the specifics vanishes. 

Before discussing our actual problem, we shall consider a matrix 
Z with (n+1r) rows, but with an unspecified number of columns and 
suppose that 


* See op. cit. (2), pp. 39-40. 
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NZ=0. (8) 


x 
z=| y | (9) 


where X has r rows and Y has n rows, and on using (7) we see that 
(8) is equivalent to 


On writing 


LX+MY=0, 
whence 
Y=—M"LX. 


Substituting this in (9) we find that Z becomes 


ad 
Z=—|ags1|%- 


or 
Z=—QX, (10) 
where 
—I 


is a given matrix in our problem. Hence every solution of (8) must 
be of the form (10) and, conversely, every matrix (10) with arbi- 
trary X annihilates N when operating as a post-factor. 

Now, if B is a solution of (4), we have 


N(B—I) =0, 
and from the above remark it follows that 


B—I=—QxX, 
or 
B=I-QX. (12) 


Every matrix of this form satisfies condition (ii), no matter what X 
is. In order to satisfy condition (i) we shall choose X in such a way 
that B becomes an orthogonal matrix; i.e., we must have 
BB’ = (I—QX) I—QX)'=IT, 
or 
(I—Q X) I—X' ) = 1. (13) 


Expanding this matrix equation and cancelling the term J on each 
side we obtain 


—QX—XUV+QXX'QV’=0. (14) 
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Before continuing with the analysis we observe that 
Q=I4+L'ML 
is a positive-definite and, consequently, non-singular matrix of order 


r and, therefore, possesses a reciprocal and a “square root”* (Q’ Q)? 


which is also non-singular and real. 
If we now premultiply (14) by Q’ and solve for the first term, 


we get 
X= (QQ) 19 QXX —QXIQ. 
This shows that X must be of the form 
X=TQ’ (15) 


where T is an (7X7) matrix yet to be determined. 
On substituting (15) in (12) and (18) we find 


B=I—QTQ (16) 
(I—QTQ) I—QT'Q) =I. (17) 


After premultiplying this equation by Q’ and postmultiplying by Q 
we can write 


(Q'—Q'QTQ') (Q—OQT' AQ) = OQ, 
U— (Q Q)T](Q' Q) 7 —7T'(' Q)] = A, 
{(Q' Q)> [J — (Q’ Q)T] (Q' Q)*} 
xX {(Q' Q)* [1 — T’(Q’ Q)](Q’ Q) +} = 7, 
{(Q' Q)> [T— (Q’ Q)T] (Q’ Q} 
x {(Q' Q)> [I — (Q' Q)T] (Q' Q)*}’ = T. 
This means that 
(Q' Q)+ [7 — (Q' Q)T] (Q’ Q)? or I— (Q’ Q)'#T(Q' Q)! 
is an r-rowed orthogonal matrix. Let 
I— (Q'Q)'T (QU Q)?'=U. 
Hence, solving for T , 
T = (Q'Q)7*(1 — U) (QQ), 
and, by (16), 
B=I—Q(Q'Q)+ (I— VU) (QQ)? Q. (18) 
* See M. Bocher: Introduction to Higher Algebra, New York (1907), p. 299. 
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Conversely, it can easily be verified that any matrix B of the form 
(18) satisfies conditions (i) and (ii) (p. 182) provided Q is defined 
by (11) and U is an orthogonal matrix of order r. Thus (18) is the 


general solution of our problem. 
It is of interest to note that different matrices U give rise to dif- 


ferent solutions B. For suppose we had 
I—Q(Q' Q)4(I—U,) (Q' Q)*Q’ 
= I— Q(Q' Q)+(1 — U,) (Q' Q)#Q ; 
it would follow that 
Q(Q' Q)+(T — U,) (QQ) Q 
= Q(Q' Q)7 (1 — U,) (Q' Q)*Q". 
On premultiplication by Q’ and postmultiplication by Q this becomes 
(Q’' Q)#(1 — U,) (QQ)? = (Q' Q)? (I — U2) (Q' Q)?, 
and since Q’ Q is non-singular, 
U,=U,. 


Thus, there exists a (1,1) — correspondence between the solutions of 
the problem and all possible orthogonal matrices of order r. The gen- 


r(r—1) 


eral solution, therefore, depends upon 5 independent parame- 





ters. 


III. Particular Solutions 


2. The simplest orthogonal matrix U = I always yields the triv- 


ial result 
B=TI. 
The next simplest case is 
=—TI, 


This corresponds to the solution 


B=I—2Q(Q' Q)7(Q' Q)?Q’, 
or 
B=I—2Q(Q Q)7“Q’. (19) 
In particular, when r = I, the only “orthogonal matrices” are the 
numbers +1 and —1, and the matrix Q reduces to a single column, 
(see eq. (11)), viz.: 
, & L, 


Q=q= et vg vee veep 
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where /; = 1, is the loading of the general factor in test 1. The quan- 
tity g' q is then a scalar and (19) becomes 


peru 28t. 
qq 
This is the formula at which Professor Godfrey Thomson arrived in 
the first of the two papers cited on p. 181. 
2. If 


c= {€:,€2,°"*} 
is any non-zero column vector, it is easy to prove that 


2ee 


ee (20) 


U=I[— 





is an orthogonal matrix, i.e., that 
UU'=!TI., 
Substituting (20) in (18) we obtain 





B=1—Q(@ Q)* (52) (9. Q)4@. 
Now put 
(Q'Q)"e =a, 
or 
e=(Q'Q)'a, 
where 


a= { Q, ,@p,+°+, A,} 


is an arbitrary column vector of order 7. We then get 


a (Qa) (Qa)’ 
B= 1 —2—Oay (a) 
In particular, if 


a=({1,1,---,1}, 
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we find that 
fess 1 3 7 
—] 1 
ae | 1 
_|hi be hy 
Qa nt mm, mM, 1 = 
Mz Mz Me 
ba Las er a 1 
Mn Mn Mn 





2 Ine 5 
, ™, ? Mz ? ? Mn 


The corresponding solution can be written in the form 


{—1,—1,.--- he > lea 


B=1—24!, 
qq 
where 
= { 15 29***5 Gr 3 Ors 9 ***y An} 
and 
Z2=@=:-=q=—l, 
— Dd lia 
ari = Mi 2 





which is identical with Professor Godfrey Thomson’s results (5) 
and (6). 





=> m= - = Fr 4 
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A METHOD FOR FINDING THE INVERSE OF A MATRIX 


LEDYARD R. TUCKER 
The University of Chicago 
The problem of solving simultaneous linear equations is one that 
frequently confronts the scientist. Whenever it is desired to state 
the parameters as expressions of the constant terms of these equa- 
tions, the problem resolves itself into the discovery of the inverse of 


the matrix of coefficients. The method here described gives a rou- 
tine for the solution of this problem. 


Consider the following set of simultaneous linear equations: 
.80 b, + .48 b+ 36 b;, = d,, 


.48 b,+ .80 be + .06 bs = d., (1) 
36 b, + .36 b. + .86 b; = dy. 


These equations may also be written in matrix form: 


80 .48 .36 b, d, 
48 .80 .36/- | b.] =] ad}. (1m) 
36 .36 .86 bs ds 

A B D 


The problem is to state the “b’s” in terms of the “d’s.” 
This problem may be reduced to a new form by defining two new 
matrices. Consider the following matrix equation: 


11 Die Dsl 1.80 .48 .36 10 0 
Do. Doo Dos|- |.48 .80 .36| = |0 1 Of. (2m) 
Ds: Ds2 Dsl |.86 .86 .86 001 

Aa A I 


The matrix with unity in the diagonal cells and zeros in all other cells 
is known as the identity matrix. The matrix A- is that matrix which, 
when multiplied by the matrix A, gives the identity matrix as the 
product. This definition of A-' is implied in equation (2m). This ma- 
trix A-! is known as the inverse of matrix A. Both members of equa- 
tion (lm) may be premultiplied by the matrix A-: 
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1 


Pu Piz Pas d, 








|Paa Pre P| .80 A Oo | 





b 
Pa Pre Pas| - 48 .80 .36 ‘| be |= Per Doz Des): | de 
ss Ps2 Das 36 .36 86 rs Psi Ds2 Das ds 
A- A A- D 


This equation can be simplified by the use of equation (2m) to the 
following form: 





10 0 | b, | [Pu Piz Pas! | 
01 0 |b, =|P2. Doe Dos|* | de 
001 Io, Ps Doz Pas] | ds 

I B A> D 


The important property of the identity matrix is that the product of 
it and a matrix B is the matrix B. For this reason it has been denoted 
the “identity matrix.” Using this property, the above equation be- 
comes: 


| b, Pir Dio Pisj | ay 


be = Par Poo Pes} ° d, . (3m) 
bs 31 Dse D3 ds 
B Ao D 


Thus it is seen that the inverse of a matrix in matrix algebra corre- 
sponds to the reciprocal of a number in ordinary algebra. The prob- 
lem is now to find the matrix A“. A method for accomplishing this, 
originally developed by A. C. Aitken’, here is simplified and provided 
with complete numerical checks. 

The first step in the solution is to build up the matrix (C.0), as 
‘shown in equation (4m). The matrix A is copied in the upper left 
section of (C.0).The upper right section has minus one’s (—1) in the 
cells of the principal diagonal and zeros (0) in all other cells, thus 
forming a negative identity matrix. The lower left section is a posi- 
tive identity matrix with plus one’s (+1) in the principal diagonal. 
The lower right section is completely filled with zeros (0). All four 
sections are of the same order. 

Continuing the construction of equation (4m): the matrix G has 
A- in the upper section and an identity matrix in the lower section. 


1Thomson, Godfrey H. “Some Points of Mathematical Technique in the 
Factorial analysis of Ability,” Journal of Educational Psychology, January, 1936, 


27, 37. 
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The matrix H has zeros (0) in the upper half and the matrix A- in 
the lower section. These matrices, G and H, may be indicated in equa- 
tion (4m) even though their entries are unknown. The equality of 
equation (4m) may be verified by performing the indicated multipli- 
cation; this is done in equation (8m). This multiplication makes use 
of equation (2m). 

The final equation in the solution is equation (7m). Matrix (C.S) 
has zeros (0) in all cells except those in the lower right section, this 
section being occupied by A. That this equation is also an equality 
may be readily seen by performing the indicated multiplication. It 
will be noted that to find A“ it is only necessary to transform matrix 
(C.0) to matrix (C.S) in such a manner as not to alter the equality 
to matrix H when (C.S) is postmultiplied by matrix G. This can be 
accomplished by a procedure composed of several similar steps. The 
first of these steps will be traced through. 

The first important point in this step of the solution is the defini- 
tion of matrix (F,.£,) of equation (5m). This matrix is such a ma- 
trix that, when postmultiplied by matrix G, a matrix filled with zeros 
is produced. This matrix is further defined to have a row equal to 
one of the rows in the upper half of (C.0) and a column equal to one 
of the columns in the left half of (C.0), these rows and columns oc- 
cupying corresponding positions in the two matrices. This condition 
is shown in equation (5m), in which (F,.£,) has row p and column q 
equal to the same row and column of (C.0). The procedure for find- 
ing this matrix will be shown later. 

When equation (5m) is subtracted from equation (4m), the re- 
sult is equation (6m). In this subtraction, matrix G can be factored 
out, and matrix H has a zero matrix subtracted from it; therefore, 
neither is altered. The matrix (C.1) is defined by equation (9m) and 
has entries as given in equation (9). 


(C.1) = (C.0) — (F;.£;). (9m) 
(C.1) j, = (C.0) x — (f7-€x) « (9) 


Since the entries in row p and column q of (F;,.E,) are equal to the 
entries in the same row and column of (C.0), the entries in this row 
and column of (C.1) are all zeros. The matrix (C.1) is, therefore, one 
step nearer (C.S) than is (C.0). In order to complete the solution it 
is only necessary to repeat the procedure outlined above. Obtain the 
matrix (C.2) from matrix (C.1) in the same manner that (C.1) was 
obtained from (C.0), and then proceed from (C.2) in the same 
way, until a matrix with the whole upper half and left half filled with 
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(10) 
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zeros is obtained. This matrix is the matrix (C.S) and contains the 
inverse of matrix A as the lower right quarter. 

In order to build the matrix (F;.E,), the matrices F, and E, 
must first be defined. The matrix £,, in equation (10m), is pro- 
portional to row p of matrix (C.0), the constant of proportionality 
being the reciprocal of the entry in the row p and column gq of matrix 
(C.0). Equation (10) shows the value of the k** entry of E,. Since, 
in matrix (C.0) of equation (4m), this row, p, times the matrix G 
is equal to a row of zeros in H, equation (10m) is an equality. In 
equation (11m) both sides of equation (10m) are premultiplied by a 
column matrix F, , the entries in F’, being equal to the entries in the 
column g of (C.0), as is shown in equation (11). The matrix on the 
right side of the equation is obviously filled with zeros, for it is the 
product of two matrices, one of which has zeros in all cells. The ma- 
trix (F,.£,) is the product of matrices F, and E,. In equations (12) 
and (13) it is shown that the entries in row p and column g of matrix 
(F,.£,) are equal to the entries in the same row and column of ma- 
trix (C.0). 


fp-€x = (C.0) pq -(C.0) x -1/(C.0) oq 5 


fx = (C.0) m = (Fi-E;) x. (12) 
Similarly: 
f j€g = (C.0) jg = (FE) jg. (13) 


In the actual calculational procedure the matrix (C.0) is built up 
as has been described; that is, it has the matrix A in the upper left 
quarter, minus one’s in the diagonal cells of the upper right quarter, 
plus one’s in the diagonal cells of the lower left quarter, and zeros 
in all other cells. The matrix (C.0) for the illustrative problem is 
shown in Table 1. The selection of row p and column q depends, in 
practice, upon the selection of the entry included in both of them. 
Several factors enter into the selection of the pivot entry. The first 
of these considerations is that it must be located in the upper left sec- 
tion of (C.0) in order to satisfy the conditions already developed in 
the theory of the solution. The second factor to be considered in the 
selection of the pivot entry is that it should be quite large numerically 
in order to minimize computational error. In the illustrative prob- 
lem in Table 1 the entry in the first row and first column of (C.0) 
was selected as the pivot entry. This entry is in the upper left sec- 
tion of (C.0) and is the second largest entry in this section. 

After the pivot entry has been selected, the next step in the cal- 
culation of the inverse is to find the row matrix F,. The reciprocal 
of the pivot entry is found first; for example, in the illustrative prob- 


lem: 
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1/.800 = 1.250000 . 


An entry in the row E; is equal to the product of the entry in the 
same column of row p and the reciprocal of the pivot entry. The for- 
mula for this calculation is given in equation (10). For an example 
in the illustrative problem the second entry in £, is: 


480 X 1.250000 = .600. 


where .480 is the second entry in the first, or p**, row of (C.0). A 
check on these calculations can be obtained by multiplying the sum of 
row p by the reciprocal of the pivot entry; this product should equal 
the sum of the row E,. The formula for this computational check is: 


ze = [ 2 (C0) m | 1/(C€.0) oq. 


The column F, is secured by copying the column g of (C.0). In 
the illustrative problem the column F, is a copy of the first, or q‘, 
column of (C.0). 

The next step in the solution is to find the matrix (C.1). The re- 
cording of the matrix (F.£,) is omitted in practice, the procedure 
being to combine its calculation with the subtraction from (C.0). The 
equation for each entry in (C.1) is given by equation (9). This 
equation states that the entry in row j and column k of (C.1) is equal 
to the corresponding entry in (C.0) minus the product of the 7" en- 
try in F, and the k** entry in E,. In the illustrative problem the en- 
try in row II ana wlumn C of (C.1) is: 


.360 — (.480  .450) = .144. 


The .360 is found in the row II and column C of (C.0). The entry in 
row II of F, is .480, and the entry in the column C of E, is .450. A 
check of these calculations can be obtained by treating the sums of 
the columns of (C.0) in the same manner as the individual entries of 
(C.0). The entry in the column F, for this calculation is the sum of 
the column F,. The result of this computation should equal the sum 
of the corresponding column in (C.1). The formula for the calcula- 
tion of this check is: 

= (C.1) yx. = = (C.0) jx — & = fi ° 

I J J 

In the illustrative problem the check on the sum of column C in (C.1) 
is: 

2.580 — (.450 * 2.640) = 1.392. 

When the entries in (C.1) have been computed, the next portion 

of the solution is a duplicate of the foregoing steps. A pivot entry is 
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selected and the row EF, and column F, found. The entries in the ma- 
trix (C.2) are then found from the entries in (C.1), E., F. in the 
same manner that the entries in (C.1) were computed from the en- 
tries in (C.0), E,, and F,. The selection of the pivot entry in (C.1) 
has the same limitations as the selection of the pivot entry in (C.0), 
namely: the pivot entry must be in the upper left section, and it 
should be one of the largest entries in this section in absolute value. 

The foregoing process is repeated until all the entries in the up- 
per half, and in the left half of the final matrix are zeros. The in- 
verse of the matrix A is then in the lower right section of this final 
matrix, (C.S). This condition is shown in equation (7m) and in ma- 
trix (C.3) of the illustrative problem. A final check of the calcula- 
tions may be made by substituting A“ into equation (2m) ; for the 
illustrative problem, that is: 


! 2.073 —.1.052 —.427| |.80 .48 .36] |1.000 .000 .000 


—1.052 2.073 —.427]}.|.48 .80 .386/—] .000 1.000 .000}. 
— 427 — .427 1.520 36 .36 .86 001 .001 1.000 
A“ A I 


The calculation of the inverse of matrix A is then complete. 
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A B Cc I II III = F, 
I 8 +480 .360 | -1.000 .000 000 -640 800 
II -480  .800 .360 -000 -1.000  .000 480 
on IIT -360 .360 .860 -000 .000 -1.000 +360 
c.0 
A 1.000  .000 000 -000 .000 .000 1.000 
B -000 1.000 ~.000 +000 .000 .000 000 
c -000  .000 1.000 -000 .000  .000 .000 
Ze 2.640 2.640 2.580 | -1.000 -1.000 -1.000 4.860 | 2.640 
E,f} 1.000  .600 .450 -1.250 .000 .000 -800 = Ch.> Ey 
Z = = 
1/.800 = 1.250000 _ 
A B c I II III = Fa 
I 
II <512 0.144 +600 -1.000 000 256 512 
(c.2) III -144 ~—-.698 +450 000 _-1.000 144 
A - .600 -.450 | 1.250 .000 .000 - .600 
B 1.000 .000 -000 .000 ~=.000 1.000 
Cc -000 1.000 +000 .000 ___.000 000 
Ch. 1.056 1.392 | 2.800 -1.000 -1.000 2.748 
z 1.056 1.592 | 2.300 -1.000 -1.000 2.748 | 1.056 
E2 1.000 .281 1.172 -1.953 -000 -500 = Ch.= E 
1/.512 = 1.953125 — 
A B CG I II III : Fs 
I 
II 
(c.2) III 658 -281 +281 -1.000 .220 658 
A -.281 | 1.953 -1.172 -000 -.281 
B -.281 |-1.172 1.955 -000 -.281 
c 1.000 -000 _.000 .000 1.000 
Ch. 1.095 1.062 1.062 -1.000 2.220 
= 1.096 1.062 1.062 -1.000 2.220 1.096 
E3 1.000 .427. 6427-1520 334 = Ch. Ey 
34> SB, 
1/.658 = 1.519757 
A B C I II Ill D> 
I 
II 
(c.8) III 
A 2.073 -1.052 -.427 
B -1.052 2.078 -.427 
Cc -.427  ~.427 1.520 
Ch. 594 594 666 1.854 
z 594 594 666 1.854 
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ANNOUNCEMENT 


A manuscript by Professor N. Rashevsky entitled “Contribution to the 
Mathematical Biophysics of Visual Perception with Special Reference to the The- 
ory of Aesthetic Values of Geometrical Patterns” was scheduled for this issue of 
Psychometrika. Owing to an editorial delay, publication of this article has been 


postponed to the December issue. 
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